Multimodal Archives - AIVineet By Vineet Tiwari

Agentic Vision in Gemini 3 Flash: What It Is, Why It Matters, and How to Use It

Agentic Vision in Gemini 3 Flash is Google’s attempt to fix a very practical failure mode in multimodal LLMs: the model “looks once,” misses a tiny detail, and then confidently…

February 10, 2026
Video LLM for Real-Time Commentary with Streaming Speech Transcription | LiveCC

LiveCC video LLM is an open-source project that trains a video LLM to generate real-time commentary while the video is still playing, by pairing video understanding with streaming speech transcription.…

January 31, 2026

Agentic Vision in Gemini 3 Flash: What It Is, Why It Matters, and How to Use It