Picar-X+Ollama: Vision Talk Issue

Hey everyone,

I’m a beginner, so apologies if this is a silly question. So far, I’ve put together the Picar-X, completed calibration, tested the camera and TTS/STT. All fine on that front.

I’m currently on the Vision Talk with Ollama section of the documentation. I’ve got Ollama working with a 4 bit quantized version of Qwen2.5VL:3B. The model itself seems to work fine, but when I run 17.text_vision_talk.py, the images don’t seem to go to the model. Instead the model just makes up an answer based on the text of the question. Any ideas how I can get it to use the image as an input?

Here’s the code (just edited Sunfounder’s to change the model). Pasting a screenshot from a Google doc as it wouldn’t let me post the code directly from my phone.