Hey everyone,
I’m a beginner, so apologies if this is a silly question. So far, I’ve put together the Picar-X, completed calibration, tested the camera and TTS/STT. All fine on that front.
I’m currently on the Vision Talk with Ollama section of the documentation. I’ve got Ollama working with a 4 bit quantized version of Qwen2.5VL:3B. The model itself seems to work fine, but when I run 17.text_vision_talk.py, the images don’t seem to go to the model. Instead the model just makes up an answer based on the text of the question. Any ideas how I can get it to use the image as an input?
