Browsing Tag
vlm
2 posts
2025 Complete Guide: How to Build End-to-End OCR with HunyuanOCR
🎯 Key Takeaways (TL;DR) A single 1B multimodal architecture covers detection, recognition, parsing, translation, and more in one…
Small Model from Huggingface with Video understanding
A couple of weeks ago, SmolVLM-2 got released by Huggingface with an amazing feature — Video Understanding. The…