Featured image of post Building Your First AI Agent

Building Your First AI Agent

While some people - okay, the CEO of Microsoft - believe that AI agents will redefine and even make software-based solutions obsolete, it's a bold claim. In this promising(?) future, diligent AI agents will handle all the mundane tasks, freeing us from sitting in front of computers, navigating complicated dashboards, and performing boring tasks all day long. While I can't predict when or if this will happen, I can show you how to build such an agent right now.

Okay, first things first, what is an AI agent? According to an AI agent itself:

An AI agent is a software that sees what’s around it, processes information, and takes actions to reach specific goals, often mimicking human thinking and problem-solving.

Microsoft describes their Copilots as “the UI of AI”, and I really like this definition. I think these intelligent agents have the potential to replace many of the software tools we use today. Let’s face it - no one enjoys navigating through complicated dashboards or scrolling through endless buttons and menu items to work with these tools.

Consider self-service checkouts at supermarkets. We used to pay at a cashier who scanned our items. Now, with self-service checkouts, you scan items yourself. While it’s usually faster and convenient, it can be challenging to find and scan specific items like vegetables, fruit, or your favorite cinnamon rolls. Previously, the cashier took care of this, but now you have to navigate these systems on your own.

AI-powered agents can save you from this frustration. Soon, they may be able to recognize products, identify the correct type, and scan them for you—just like cashiers did in the past. Curious to see an AI agent up close? The best way is to build one yourself.

Prerequisites

To follow this tutorial, you’ll need:

  • Python 3.8+: Experience with Python basics is required
  • Visual Studio Code: Or any other code editor of your choice
  • REST API knowledge: Basic understanding of REST APIs and HTTP methods
  • Flask: We’ll use Flask for the API - I’ll provide the code, but familiarity helps
  • Ollama: Install Ollama on your machine. See my post about prompt engineering if you need help getting started.

Clone the sample code repository to follow along.

Make sure you have Python and Ollama installed before proceeding:

1
2
python --version  # Anything above 3.6 should be fine
ollama --version  # Should return the version number

About Our Agent

Our Smart Boiler AI Agent in action

Last December we moved into our new home, and I have to admit, I’ve spent some (perhaps too much…) time setting up the smart thermostat. This inspired me to build an AI agent to control a theoretical smart heating system. Don’t worry; we won’t be tinkering with a real boiler here. Instead, the agent will interact with a mock smart thermostat via a simple REST API. Since I plan to use Ollama with Python, it made sense to use Python for the REST API as well.

Let’s dive in!

Setting the Table: Building the REST API

First, we need to create a REST API that our AI agent can interact with. We’ll use the Flask framework for this. To shave off some time, I’ve scaffolded the REST API itself for you. Who am I lying to? 😀 I asked Copilot to help me figure out how a theoretical smart thermostat API would look like. Below is what it came up with - but please, lower your expectations. This post is not about crafting an award-winning API. If my AI agent can interact with it, that’s good enough for me.

Draft of the Endpoints

Since I never worked with a smart thermostat API before, I cannot guarantee that this is the best design. However, I believe the endpoints below are fair enough for this experiment.

OperationEndpointMethod
Get actual temperature/thermostat/actualGET
Get desired temperature/thermostat/desiredGET
Set desired temperature/thermostat/desiredPOST
Get boiler state/boiler/stateGET
Get error code/boiler/errorGET

If you feel more comfortable with Swagger, you can find the OpenAPI specification on the GitHub repository. Paste it into the Swagger Editor to visualize the API.

Implementing the API

The API is a Flask service with the endpoints described above. I’ve declared some global variables to store the actual and desired temperatures, the boiler state, and the error code:

1
2
3
4
actual_temperature = 21.5
desired_temperature = 22.0
is_heating = False
error_code = 0

If you dig into the code, you may find some other interesting things, like the temperature adjustment thread control:

1
2
3
# Temperature adjustment thread control
temperature_thread = None
should_run = False

…and the code of the temperature adjustment thread itself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def adjust_temperature():
    global actual_temperature, is_heating, should_run

    while should_run:
        if abs(desired_temperature - actual_temperature) < 0.1:
            is_heating = False
            actual_temperature = desired_temperature
        else:
            is_heating = True
            if actual_temperature < desired_temperature:
                actual_temperature = round(actual_temperature + 0.1, 1)
            else:
                actual_temperature = round(actual_temperature - 0.1, 1)
        time.sleep(TEMPERATURE_UPDATE_INTERVAL)

I’ve added this little twist to simulate a realistic heating system. The adjust_temperature function runs in a separate thread, continuously updating the actual temperature at regular intervals (every TEMPERATURE_UPDATE_INTERVAL seconds). This allows us to observe the boiler state changing and the actual temperature fluctuating, gradually aligning with the desired temperature - like in the real world. Okay, maybe a bit faster than in the real world, but you get the idea.

Alright, let’s see if it works or not!

1
2
3
4
5
python3 -m venv venv
source venv/bin/activate
cd boiler_api
pip install -r requirements.txt
python boiler_api.py

If everything goes well, you should see the following output:

1
2
3
4
5
6
7
 * Serving Flask app 'boiler_api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8000
 * Running on http://192.168.0.101:8000
Press CTRL+C to quit

You can now test the API by sending requests to its endpoints using a tool such as Postman, or any other tool you prefer.

Let’s verify the API by asking about the current temperature:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GET http://localhost:8000/thermostat/actual

HTTP/1.1 200 OK
Server: Werkzeug/3.1.3 Python/3.9.6
Date: Sun, 26 Jan 2025 00:05:31 GMT
Content-Type: application/json
Content-Length: 28
Connection: close

{
  "actual_temperature": 21.5
}

Great! The API is up and running. 👏

Creating a Python Client for the API

Before we get to the fun part, let’s create a Python client to interact with the API. Why, you ask? By specifying these Python functions to our LLM agent, it can understand their functionality and purpose. The agent will then determine if the tasks we assign require interaction with the thermostat API or not.

This Python library will serve as the glue between the agent and the thermostat API.

Good news is, I’ve already created the Python client for you, please find it in the GitHub repository. The script is called boiler_client.py.

You can test it by running the following command:

1
2
3
4
5
6
7
python boiler_client.py

Current temperature: 21.5°C
Desired temperature: 22.0°C
Boiler state: not heating
Error state: {'code': 0, 'message': 'No error'}
Set new desired temperature to 23.0°C

☝️ If you run the script again, you should see the actual temperature increased a notch. Remember? This is because the adjust_temperature function is running in the background, gradually adjusting the actual temperature to the desired one.

Giving Clues to the AI

If you have a keen eye, you may have noticed that the methods of the BoilerClient class are pretty extensively documented. From version 0.4 onward, the Ollama Python library uses these docstrings to inform the agent about the purpose of each method. This helps the agent understand the functionality and appropriate usage of each method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def get_desired_temperature(self):
    """Retrieve the desired temperature setting from the boiler.

    Returns:
        float: The desired temperature in Celsius.

    Raises:
        requests.exceptions.RequestException: If the API request fails.
    """
    response = requests.get(f"{self.base_url}/thermostat/desired")
    response.raise_for_status()
    return response.json()["desired_temperature"]

When documenting your code, strive to be as precise and descriptive as possible. Here, I’ve even specified that the temperature is in Celsius.

What times we live in! It’s as if I’ve completely lost my mind — while some people are concerned about AI taking over their jobs, here I am, assisting it with documentation 🤷.

Building the AI Agent

Now the fun part begins! Let’s assemble our AI agent using Ollama and the Python library, to control the imaginary smart thermostat via a small CLI tool. We’ll use the Mistral NeMo model.

Not all LLMs support tools. When selecting the perfect model for your project, look for those marked with the “tools” tag in the Model Library.

Why Mistral-NeMo? Mistral-NeMo is an ideal choice because:

  • 🔧 Tool Compatibility: Has the ability to use tools.

  • 📄 License: Distributed under a permissive Apache 2.0 license.

  • Powerful and Compact: Offers very good performance with a relatively small model size.

The System Prompt

Let’s get started by crafting our system prompt:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SYSTEM_PROMPT = """
You are an AI assistant for a smart boiler system.
Your task is to interact with the boiler using the Boiler API.
Do your best to help the user with their boiler-related queries.
If the user feels cold, they might want to increase the desired temperature. If you don't know the temperature, you can ask the boiler for it.
If the user feels hot, they might want to decrease the desired temperature.
Do not make up information. If you don't know something, ask the boiler for the information. If there is no function for that, tell the user you can't help with that.
Restrict your responses to the boiler system only, deny any general information requests.
Proceed with the command:
"""

This system prompt defines the AI agent’s role as a smart boiler assistant while setting boundaries to ensure it only provides responses related to our topic. It uses our API - via the Python client - for interacting with the boiler.

Next, define a helper function to get the available functions from the boiler client. While this might seem overkill, it really helps manage the code as the number of functions increases.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def get_available_functions(client: BoilerClient) -> dict:
    available_functions = {
        "get_actual_temperature": client.get_actual_temperature,
        "get_desired_temperature": client.get_desired_temperature,
        "set_desired_temperature": client.set_desired_temperature,
        "get_boiler_state": client.get_boiler_state,
        "get_error_state": client.get_error_state,
    }

    return available_functions

Since the magic happens there, let’s focus on the important parts of the get_model_response function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def get_model_response(prompt, client: BoilerClient, model="mistral-nemo") -> str:
    # ...

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt},
    ]
    first_response = ollama.chat(
        model=model,
        messages=messages,
        tools=[
            client.get_actual_temperature,
            client.get_boiler_state,
            client.get_desired_temperature,
            client.set_desired_temperature,
            client.get_error_state,
        ],
    )

Most of the parameters might look familiar, but for the sake of clarity, let’s run through them:

  • model: The Mistral-NeMo model we’re using - declaring it as a parameter allows us to use different models later.
  • messages: Here, we provide the system prompt (see above) and the user’s prompt (which we will cover later).
  • tools: This is the interesting part. We pass the available functions from the boiler client to this parameter. We inform Ollama that the agent can use these functions to interact with our boiler API.

☝️ Feel free to refactor the tools parameter declaration to use the dict from the get_available_functions helper function. I kept it here as it is to help illustrate the concept better.

Then we can proceed and verify if the agent actually called the functions from the boiler client or not.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
if first_response.message.tool_calls:
    output = ""
    for tool in first_response.message.tool_calls or []:
        function_to_call = get_available_functions(client).get(tool.function.name)
        if function_to_call == client.get_actual_temperature:
            output += f"\nActual temperature: {function_to_call()}°C"
        elif function_to_call == client.get_desired_temperature:
            output += f"\nDesired temperature: {function_to_call()}°C"
        elif function_to_call == client.get_boiler_state:
            output += f"\nBoiler state: {function_to_call()}"
        elif function_to_call == client.get_error_state:
            error_state = function_to_call()
            output += f"\nError code: {error_state['code']}\nError message: {error_state['message']}"
        elif function_to_call == client.set_desired_temperature:
            output += f"\n{function_to_call(**tool.function.arguments)}"

    messages.append(first_response.message)
    messages.append({"role": "tool", "content": output, "tool": tool.function.name})
    final_response = ollama.chat(model=model, messages=messages)
    response_content = final_response.message.content
else:
    response_content = first_response.message.content

return response_content

If there are no tool calls in the first response, we return the content of that response just as we would without tools.

Otherwise, we iterate over the tool calls and invoke the corresponding functions from the boiler client. As you can see, our Python client handles the function calls itself, while the model specifies the function name (and arguments, if any).

How we handle the output of these functions is up to us. In some cases, we might process or manipulate the output, while in others we might simply pass it back to the model as is. In our case, we append the output to the messages list and pass it back to the model. It’s important to use the "tool" role in the messages list to indicate that the output is from a tool.

Behind the Scenes

As I mentioned early, I’ve documented the boiler client methods pretty well for a reason. Normally, we should craft a JSON scheme for the tool definitions, but Ollama can infer the definitions from the docstrings and the code itself, using Pydantic under the hood.

For example, take a look at the get_actual_temperature method’s schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "type": "function",
  "function": {
    "name": "get_actual_temperature",
    "description": "Retrieve the current actual temperature from the boiler",
    "parameters": {
      "type": "object",
      "properties": {}
    }
  }
}

☝️ If you insist on crafting the JSON schema yourself, you can assign the structure above to a variable and pass it to the tools parameter.

Interacting with the User

Let’s keep it simple. We will create a loop and ask the user for input until they decide to exit, typing bye. After each user input, we call the get_model_response function we just discussed and print the response.

🏠 Home Work: Create a Streamlit app to take this experiment to the next level.

See Our Agent in Action

It’s showtime! Let’s run the whole thing and see how our AI agent interacts with the boiler API. Since we’re already have VsCode at hand, I suggest you to open a new terminal and run the API server in the background:

1
2
cd boiler_api
python boiler_api.py

Then, in another terminal, run the agent:

1
python ai_agent.py

Almost forgot! Don’t forget to pull the Mistral-NeMo model from Ollama before running the agent:

1
ollama pull mistral-nemo:latest

Also, you might need to issue ollama serve to start the Ollama server if it’s not running already.

Now you can interact with the agent by typing your free form questions and commands. The agent will respond accordingly, interacting with the boiler API as needed. Here’s a sample interaction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Boiler Chat Agent (using mistral-nemo)

Type 'bye' to exit

You: what is the temperature in the room?
Agent: The actual temperature in the room is 22.0°C.
You: set it to 24 celsius
Agent: The desired temperature has been set to 24°C. Is there anything
else I can assist you with?
You: is my boiler works properly?
Agent: Your boiler appears to be functioning properly and is currently
in the 'heating' mode. If you're experiencing any issues or would like
further information, please let me know! 🌡️🔥
You: now what is the temperature here?
Agent: The current temperature here is 22.4°C.

If you peek under the hood of the API server, you should see how the agent interacts with it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
 * Serving Flask app 'boiler_api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8000
 * Running on http://192.168.0.101:8000
Press CTRL+C to quit
127.0.0.1 - - [29/Jan/2025 18:16:29] "GET /thermostat/actual HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2025 18:16:44] "POST /thermostat/desired HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2025 18:16:58] "GET /boiler/state HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2025 18:17:24] "GET /thermostat/actual HTTP/1.1" 200 -

Here is the evidence that our AI agent is not merely mimicking the interaction, but is genuinely engaging with the boiler API. 👍

Conclusion

Fantastic work - thank you for staying with me until the end! In this tutorial:

  • 🤖 We’ve built an AI agent that interacts with a mock smart thermostat API.
  • ⚙️ We’ve learned how to instruct the agent to use our code to control the heating.
  • 🔧 We’ve seen how to handle tool calls and responses in the agent code.
  • 🚀 Finally, we’ve seen how the whole system works together in a real-world…ish scenario.

Needless to say, chatting with a robot is not the most exciting or effective way to control your heating system. However, you can apply the same principles to build agents that interact with other APIs, services, or even IoT devices.

You don’t need to think big - the potential real-world application of this technology is approaching quickly. Embrace it! Here are some ideas to spark your inspiration:

  1. 📰 Create an AI agent to fetch the latest articles from your favorite news site and compile your own daily digest or newsletter.

  2. 📝 Develop an AI agent to analyze your git logs and generate work log summaries for your Jira tickets. Why stop here? The agent should add the the work logs to Jira as well!

  3. 👨‍🍳 Create an AI agent as a Telegram bot that provides personalized weekly meal plans with recipes and shopping lists based on your dietary preferences (just put it into the system prompt).

I hope I sparked your imagination! Feel free to share your ideas or ask questions in the comments section below. Until next time, happy crafting! 🚀

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy