Pidog V2 (Raspberry Pi 5) with ai

Hello everyone,

I have assembled and set up the Pidog V2 with Raspberry Pi 5. Everything is working, and the kit is well-detailed.

I now want to expand the capabilities of Pidog with AI. From what I have read, it seems the easiest way is to register with OpenAI and follow the instructions here:
https://docs.sunfounder.com/projects/pidog/en/latest/openai.html
On Github, I see at least three projects:

  1. https://github.com/sunfounder/pidog/tree/master/gpt_examples

  2. https://github.com/sunfounder/pidog/tree/master/llm_server_example

  3. https://github.com/sunfounder/pidog/tree/master/local_llm_example

As I understand it, the first two are server/client-based (server on workstation, client on RasPi), and the third one runs autonomously without an additional computer. Is that correct?

I would like to understand the mechanisms to work more broadly with AI.
My questions for you:

  1. Could you provide some introductory material for me, especially to understand how the API is used and what adjustments need to be made?

  2. Here, in step 3, Pidog is taught its capabilities: https://docs.sunfounder.com/projects/pidog/en/latest/openai.html
    2.1 “## actions you can do:” → How is this linked to the Python scripts?
    2.2 “## Response Format:” → Does this mean that Pidog will always respond this way when it answers?

  3. Can the “character” of Pidog also be changed, allowing for experimentation without it becoming too expensive? I saw a project where the robot could play several different characters: New project announcement: k9-polyvox - News - Sunfounder Forum
    –> ist it a “stand alone solution” running AI locally?

  4. What possibilities are there to integrate the robot as much as possible into its environment with people: engaging in dialogue, executing commands, solving tasks with image processing…

Thank you in advance.
Best regards,
Artur

k9-polyvox uses the OpenAI Realtime API and provides direct audio back and forth with different voices and personalities. It’s running almost all the AI in OpenAI, not locally. And yes it’s fairly expensive, the API costs maybe $8/hour of usage.

The GPT Dog example runs TTS and STT on the pi and uses the GPT text-based APIs, so it is much slower, but also much less expensive.

Nice to hear from you Pablo!

I find your idea funny - just the right thing to get my kids excited about it. The project sounds like the “whole package” and will probably be fun. I am surprised by the costs - when I read “Millions” tokens on the OpenAI Platform, they cost between one and 20 dollars… Once I have obtained a credit card for payment, I will get started.

Best regards,
Artur

Hello,

I have successfully set up the Sunfounder GPT example on Pidog: compliment for the good instructions, it ran very smoothly.
There seems to be an error in the documentation: tail wagging is only executed when the action is specified as “wag tail” instead of “wag_tail”.
Furthermore, I have added additional actions to the list, such as “attack_posture” from preset_actions.py.
The model used is gpt-4o or gpt-4o-mini. Image recognition works very well, and language is best recognized in English.
Commands are executed, for example, “go to …”, where Pidog makes a movement in the direction, but stops after two steps and waits for commands again.

Interestingly, Pidog seems to have a sense of time from 2023: even after setting the current date on the Raspberry Pi, Pidog continued its calculations (birthdays) based on 2023 until I manually input the correct date via the keyboard. Since then, Pidog remembers the time.

How can I make Pidog walk all the way to the object?
How can I give Pidog a female voice?

We called Pidog “Harmony” –> she has a fleshy sister called “Melody”:

Greetings,
Artur

  1. Currently, Pidog takes brief actions when executing movement commands (such as “move forward”). To continuously move to a specific target, you may need to implement a loop for the movement command or adjust the action duration. You can try modifying the movement function in the code to add a loop condition for continuous movement.
  2. How can I change Pidog’s voice to female?
    Please refer to our tutorial instructions:
    Modify Parameters - Optional
    The TTS_VOICE variable allows you to select the voice role for the Text-to-Speech (TTS) output. Available options include: alloy, echo, fable, onyx, nova, shimmer.

Thank you.

Now something about the language Model:

How can force TTS to pronounciate the name “Stella” correctly?

TTS speaks it like “SHtella”.

I tried with description of the assistent and politly talking… No effect)

BR artur

In the main() function, find the part where TTS is called and add the preprocessing:

python

# ---- tts ----
_status = False
if answer != '':
    # Preprocess the text to ensure Stella pronounces correctly
    processed_answer = preprocess_text_for_tts(answer)
    
    st = time.time()
    _time = time.strftime("%y-%m-%d_%H-%M-%S", time.localtime())
    _tts_f = f"./tts/{_time}_raw.wav"
    _status = openai_helper.text_to_speech(processed_answer, _tts_f, TTS_VOICE, response_format='wav')

Then add the preprocessing function:

python

def preprocess_text_for_tts(text):
    """
    Preprocess the text to ensure TTS correctly pronounces specific vocabulary
    """
    # Handle Stella's pronunciation - using SSML phoneme tags
    text = text.replace('Stella', '<phoneme alphabet="x-sampa" ph="s t E l @">Stella</phoneme>')
    
    # Additional vocabulary needing special handling can be added here
    # text = text.replace('other vocabulary', '<phoneme...>')
    
    return text

See if this method solves the issue. It’s also recommended to consult AI online to see if there are better solutions available.

Hello again,

I’ve adde the code before the main() function after line 101 ( before def speak_hanlder():).

Unfortunately the change in the code has no effect on the prelearned pronouciation.

Beste Reragrds

Artur