Ai2’s Molmo 2: Open-Source Video Model Challenges Proprietary Competitors

This article was generated by AI and cites original sources.

The Allen Institute for AI (Ai2) has unveiled Molmo 2, an open-source video model that aims to compete with larger proprietary models in video understanding and analysis. Molmo 2, following the success of Ai2’s Olmo foundation model, demonstrates the potential of smaller open models in enterprise applications.

Molmo 2 offers three variants: Molmo 2 8B for video grounding and question answering, Molmo 2 4B for efficient deployments, and Molmo 2-O 7B based on the Olmo model. The model supports single-image, multi-image inputs, and video clips of various lengths, enabling tasks like video grounding, tracking, and question answering.

Ai2 emphasized the importance of grounding in open models, a gap Molmo 2 aims to address. The model surpasses previous versions in accuracy, temporal understanding, and pixel-level grounding, and competes with larger models like Google’s Gemini 3.

Performance Comparison

Molmo 2 outperformed competitors like Gemini 3 Pro in video tracking benchmarks. In image and multi-image reasoning, the 8B model leads all open-weight models, with the 4B variant closely behind. Notably, Molmo 2 excels in video grounding and counting, areas where it surpasses similar open-weight models.

While larger proprietary models still lead in some benchmarks, Molmo 2’s success highlights the progress in optimizing smaller open models for specific tasks like grounding and analysis.

Source: VentureBeat

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *