Run Your Own AI Assistants Locally

🎒 Resources

📖 Introduction

In the previous post, we saw how to set up a simple AI Assistant that can be run locally. We’ve seen how to use Ollama as the local provider for LLMs, such as codegemma. Next, in the lightweight code editor, called Zed we’ve created a Streamlit app from scratch. Finally, we learned how to provide context to the assistant using slash commands.

Today we will use Visual Studio Code as our IDE, extending it with Continue, the “leading open-source AI code assistant”.

💾 Preparing the Models

If you followed the previous post, you should already have the codegemma model (for code generation) and the llama3.1 model (for chat). Now we need a third one - for the embeddings. So we should pull that one now. But wait…

What are embeddings? This topic deserves its own post or even a series. In short, embeddings allow us to represent words or sentences in a vector space, making it possible to understand relationships between semantically similar words. Continue will turn to the embeddings model to create vectors, when you ask something related to the whole codebase, such as “create unit tests for my GreetingService” or “where should I put my code for a new endpoint?”. Those vectors will be indexed in a vector database, and the assistant will use them to provide more accurate suggestions quite swiftly.

Okay, let’s pull an embeddings model. We will use the nomic-embed-text model, which in fact, the recommended model by the Continue team.

1
ollama pull nomic-embed-text

This model is relatively small, so it should be quick to download.

🔧 Setting up VS Code & Continue

Currently, Continue supports Visual Studio Code, and JetBrains IDEs. For this tutorial, we’ll focus on VS Code because it is lightweight, popular, and free to use. I assume you already have it installed and you know how to use it. So let’s install the Continue extension:

Open the Extensions view and search for Continue.
Make sure you selected the extension published by Continue.dev.
Click on the Install button.

Install the Continue extension

Configure Ollama as the Provider

I’ll show you two methods how to get up and running with Ollama in Continue:

1., Configure Within the Extension: First, open the Continue panel from the toolbar, or from the Command Palette (CMD or CTRL+SHIFT+P). In the panel, click on the dropdown menu in the input box to add, or configure models at any time. Since this is your first time using the extension, you should only see the Add Model option with a plus sign.

Choose Add Model.
Select Start with a provider.
Scroll down to Ollama and click on it.
Click on Autodetect - this will populate all the models we have pulled previously.
Before prompting, make sure the chat model is set to llama3.1:8b instead of the embedding model - that model is not capable of generating text.

Configure Ollama as the provider

…And that’s it! You have a working setup with Ollama and Continue in Visual Studio Code. 🎉

2., Edit the config.json file directly: Open the Command Palette and search for Continue: Open config.json command. Don’t worry if you feel overwhelmed, the important pieces will be discussed in detail later, now here is the config.json file with the same settings we have set up in the previous method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
{
  "models": [
    {
      "model": "AUTODETECT",
      "title": "Ollama",
      "apiBase": "http://localhost:11434",
      "provider": "ollama"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder2 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://localhost:11434"
  },
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "edit",
      "description": "Edit selected code"
    },
    {
      "name": "comment",
      "description": "Write comments for the selected code"
    },
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ]
}

Apply the changes and save the file, and you are all set!

✨ Working with Continue

Before we begin, could you please move the Continue panel from the left toolbar to the right side? 🙏 You’ll thank me later.

Move the Continue panel

This way, you can operate with the AI Assistant even when the Explorer panel is open. Speaking of which, let’s open the ‘project’ we’ve created in the previous post. Hit CMD/CTRL+O and navigate to the streamlit app folder.

Commands, Context and Autocomplete

We’ve seen how to use slash commands in Zed to provide context and influence the assistant’s behavior. Needless to say, Continue has its own set of commands and other features to provide the same functionality (and even more).

For starters, let’s ask the assistant to explain the code in the app.py file:

👤 What does this script do? @app.py

Note the @app.py at the end of the message: in Continue, we use the @ symbol to provide context to the model. In this case, we want the assistant to focus on the code in the app.py file.

Here is the response I got:

This script creates a simple web application using the Streamlit framework.
When run, it displays a title “My Simple Streamlit App” at the top of the page.
Below the title, there is a text input field labeled “Enter your name:”. The user can type their name in this field.
Next to the text input field, there’s a button labeled “Submit”. When clicked, this button triggers an action.
If the button is clicked and the text input field contains some text (i.e., the user has entered their name), it displays a greeting message at the bottom of the page that says “Hello, [username]!”.
In summary, this script creates a simple web application that asks for a username through a text input field and displays a personalized greeting when the button is clicked.

I would not have explained it better myself! 👏

Harnessing the Power of Embeddings

Let’s say I want to refactor my Streamlit application by extracting the greeting widgets into a separate tab. This way, I can add a second page (tab). It’s a long shot for a Java developer, but let’s see how the assistant can help me achieve this.

I will use the Streamlit API reference as guidance. Actually, I will ask the assistant to guide me through the process. But first, I am going to help the assistant a little bit.

Continue comes with popular references and manuals pre-indexed. Unfortunately, Streamlit isn’t on the list. I’ll show you how to let Continue to index any reference from the web.

First, trigger the @docs context provider. As you scan through the list of pre-indexed documentations, you can see that Streamlit is missing.
Go to the bottom of the list and you should find the + Add Docs option. Click on it.
In the dialog, add a title to the documentation, like Streamlit.
Paste the URL of the https://docs.streamlit.io/develop/api-reference.
Click Submit.
You will be notified when the reference has been indexed.

Add Streamlit API reference

Now we can work on the changes in the code. Give some instructions to the assistant pointing out the Streamlit API reference (using @Streamlit):

👤 @Streamlit @app.py Please add two tabs to this Streamlit application.
The first tab should hold the current content
The second tab should called “About Me” and contain some placeholder text so I can add content later

Before we start to examine the response, I need to emphasize two things.

Please pay attention to the Context Used section below the input box. It shows the context the assistant used to generate the response:

Context used by the assistant - the models I’ve chosen are highly tolerant of my bad English

These entries are clickable, and will take you to the exact location of the subject in the Streamlit API reference (eg. Complex layouts).

This is my first good advice: always verify the information the assistant provides. As you may know, AI models can sometimes hallucinate and provide incorrect information.

Secondly, as the complexity of tasks increases, the rule above will become exponentially true. I use very straightforward examples here on purpose: to demonstrate a feature or concept without getting lost in the details. In our daily work, we will likely have to solve much more complex tasks, which, if we want to solve with the help of an AI Assistant, will require multiple iterations of prompting.

☝️ Always balance the time spent interacting with the assistant against the time you would spend solving the problem yourself. Remember, the AI is just a tool, but you are the expert.

Alright then, let’s see what we got:

To add two tabs to your Streamlit application, you can use the st.tabs function. Here’s an updated version of your code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import streamlit as st

# Title of the app
st.title('My Simple Streamlit App')

# Create tabs
tab1, tab2 = st.tabs(["Home", "About Me"])

# First tab holds current contents
with tab1:
    # Text input field
    name = st.text_input('Enter your name:')

    # Button to display a message
    if st.button('Submit'):
        st.write(f'Hello, {name}!')

# Second tab contains placeholder text
with tab2:
    st.markdown('# About Me')
    st.markdown('## Placeholder text for later...')

In this code, we create two tabs using st.tabs(["Home", "About Me"]). We then use the with statement to specify which content should go into each tab. The first tab (tab1) holds the original contents, and the second tab (tab2) contains placeholder text that you can replace later with your own content.

Applying the suggestions, you should see the new layout in the Streamlit app:

Streamlit app updated with tabs

Slash Commands

In the previous post, we’ve seen how slash commands provide a powerful way to interact with the assistant. While Continue includes several built-in commands, the fun part is that you can create your own commands.

Start with a built-in command and ask the assistant to replace the placeholder text with something more meaningful.

Highlight the code block in the second tab:

1
2
3
with tab2:
    st.markdown('# About Me')
    st.markdown('## Placeholder text for later...')

Now press CMD/CTRL+L to include the code snippet in the input box.

Use the /edit command to ask the assistant to replace the placeholder text:

👤 /edit Replace the placeholder text with information about me. I am Tamas Csizmadia, a Java Developer from Hungary. Include a link to my GitHub profile: https://github.com/tcsizmadia

In the code editor, the assistant will show you the diff of the changes it suggests. If you agree with the changes, you can apply them by clicking the Accept All ✅ button (or pressing CMD+SHIFT+ENTER).

1
2
3
4
5
with tab2:
    st.markdown('# About Me')
    st.markdown('## Hi, I am Tamas Csizmadia, a Java Developer from Hungary.')
    st.markdown('I love building software solutions that make people\'s lives easier.')
    st.markdown('You can find more about me on my GitHub profile: https://github.com/tcsizmadia')

A bit vague, but eventually I got what I asked for.

Create a Custom Command: Cleanup

Another neat feature of Continue is the ability to define custom commands. This lets you extend the assistant’s capabilities and tailor it to your workflow. You have two options:

Define a command using natural language in config.json
Craft more advanced logic using TypeScript in config.ts

The latter is more powerful, as you can access aspects of the Continue SDK. But for the sake of simplicity, let’s stick with the natural language approach, as it is easier to understand, yet powerful enough for most use cases.

Open the config.json file (CMD+SHIFT+P and search for Continue: Open config.json) and add the following entry to the customCommands array:

1
2
3
4
5
{
  "name": "cleanup",
  "description": "Removes any TODO, commented out code or debug log messages from the provided code.",
  "prompt": "{{{ input }}}\nRead carefully the provided code and remove the following unneeded elements: TODOs, commented out code, printed out messages for debug purposes, unneeded imports. Do not fix or implement anything, your task is only to remove unwanted elements from the code."
}

After saving the file, you can use the /cleanup command to ask the assistant to clean up the code. Let’s see it in action! 👏

Normally I would never ask a developer to do this, but please put some garbage into the code in app.py. Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import streamlit as st
import json, csv

# Title of the app
# TODO: Add title
st.title('My Simple Streamlit App')

# Create tabs
tab1, tab2 = st.tabs(["Home", "About Me"])

# TODO: Implement Register page
# tab1, tab2, tab3 = st.tabs(["Home", "About Me", "Register"])

# with tab3:
#     st.text('# This is the register page.')

# First tab holds current contents
with tab1:
    print('Tab1 create')
    # Text input field
    name = st.text_input('Enter your name:')

    # Button to display a message
    if st.button('Submit'):
        st.write(f'Hello, {name}!')
        print('Button clicked')

# Second tab contains placeholder text
with tab2:
    print('Tab2 create')
    st.markdown('# About Me')
    st.markdown('## Hi, I am Tamas Csizmadia, a Java Developer from Hungary.')
    st.markdown('I love building software solutions that make people\'s lives easier.')
    st.markdown('You can find more about me on my GitHub profile: https://github.com/tcsizmadia')
    #st.markdown('TBD')

I’ve added two unneeded import statements (json and csv), some commented-out code, a few TODO comments, and some print() statements. Now, let’s ask the assistant to clean up the code:

👤 /cleanup @app.py

Here’s the updated code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import streamlit as st

# Title of the app
st.title('My Simple Streamlit App')

# Create tabs
tab1, tab2 = st.tabs(["Home", "About Me"])

with tab1:
    # Text input field
    name = st.text_input('Enter your name:')

    # Button to display a message
    if st.button('Submit'):
        st.write(f'Hello, {name}!')

with tab2:
    st.markdown('# About Me')
    st.markdown('## Hi, I am Tamas Csizmadia, a Java Developer from Hungary.')
    st.markdown('I love building software solutions that make people\'s lives easier.')
    st.markdown('You can find more about me on my GitHub profile: https://github.com/tcsizmadia')

If you are satisfied with the result (as I am), you can apply the changes to the editor clicking the Apply to current file button - just make sure that app.py is the active tab in VS Code.

Autocomplete

One of the greatest features of GitHub Copilot is the so-called ‘Ghost Text’. As you type, the Assistant predicts your intentions and completes the code for you. While the results aren’t always perfect, when it works, it’s incredibly satisfying and helps you get into the flow. ✨

This feature is currently beta in Continue. To try it out, specify a compatible model in the config.json in the tabAutocompleteModel entry. Let’s try it with the recommended model, the starcoder2:3b:

1
2
3
4
5
6
7
  "tabAutocompleteModel": {
    "title": "Starcoder",
    "provider": "ollama",
    "model": "starcoder2:3b",
    "apiBase": "http://localhost:11434"

  },

An uncomplete, very subjective and somehow biased summary of the models I’ve tried so far with Continue:

Model	Description	Preference
`starcoder2:3b`	Recommended by the Continue team. Well balanced between performance and responsivity	⭐ ⭐
`deepseek-coder-v2:6b`	Very powerful, the performance and accuracy is in par with ChatGPT-4o (in coding tasks). In turn, it requires more resources, and the first response can be slow.	⭐ ⭐ ⭐
`deepseek-coder:1.3b`	The ’light’ variant of the above model. Use it as a last resort, if you’re on a modest machine, since it hallucinates a lot.	⭐
`codegemma:2b`	This is my go-to model for code generation tasks. Unfortunately, it is not supported by Continue yet 😭	🤷

So if you ask me…

I would go with deepseek-coder-v2:6b first, as it performs very well in coding tasks.
If you experience lagging, timeouts or autocomplete puts a heavy load on your machine, you can switch to starcoder2:3b.
If you’re still not satisfied with the responsiveness, you may try deepseek-coder:1.3b (and always double-check the suggested code 👮‍♂️).

Hopefully my favorite model, codegemma will be supported soon. 🤞

Peek Under the Hood

Before wrapping up, I’d like to invite you to explore how the assistant interacts with local models. If you started Ollama from the GUI, please stop it now. Then, open a terminal and start Ollama with the following command:

1
ollama serve

After a while, Ollama would be up and running. Split the screen and put the terminal on one side and VS Code on the other - or place the two windows side by side. Now, start editing the app.py file and implement a ‘Quote of the day’ feature inside the ‘About Me’ tab. Please do not chat with the assistant, instead leverage the ‘Ghost Text’ feature to complete the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
with tab2:
    # ...
    # an array with 3 random quotes
    quotes = [
        "The only way to do great work is to love what you do. - Steve Jobs",
        "The best way to predict the future is to invent it. - Alan Kay",
        "The only thing we have to fear is fear itself. - Franklin D. Roosevelt"
    ]

    # pick a random quote
    quote = random.choice(quotes)
    st.markdown(f'## Quote of the day\n\n"{quote}"')

Autocomplete - under the hood

As you type, keep an eye on the terminal window. You can observe the exchange of requests and responses between Continue and Ollama.

Continue constantly calls Ollama’s "/api/generate" endpoint to ask for completions. Peel your eyes and you’ll see an extra POST request to the /api/embeddings endpoint whenever you save your files.

💡 Continue knows your actual codebase by maintaining a real-time index using embeddings. This enables it to provide more accurate suggestions and context-aware responses tailored specifically to the project you are working on.

💼 Conclusion

We’ve learned how to set up and use a more advanced solution for running AI Assistants locally. We’ve seen how easy to set up and configure the VS Code code editor with such great and powerful tools like the Continue extension and Ollama. We’ve explored the main features through simple examples, and I hope you now have a sense of how these tools can enhance your developer workflow.

Finally, here comes the million-dollar question: is it worth it? Or should you stick to commercial solutions like GitHub Copilot? 🤔

Although Continue is still in its early stages compared to GitHub Copilot, it shows a lot of potential and can significantly help developers in their daily tasks. Let’s examine some pros and cons:

🔒 As your intellectual property (the code) never leaves your machine, you can be sure that you are safe from IP leakage. This is a huge advantage, allowing you to start using AI Assistants in your projects even if you have had privacy concerns before.

⛔ Furthermore: since it operates on your machine, this solution is likely the only one that can be used in air-gapped environments. This opens the door for using AI Assistants in highly secure environments (think about industrial, financial or military settings).

✂️ The customization options are excellent. You can define your own commands and switch between models easily. Additionally, you can fine-tune a model to meet your specific needs. Since privacy concerns are addressed, you can include your Jira tickets, MRs or internal documents in to the context to improve the assistant’s suggestions.

📝 I didn’t mention this earlier, but Continue collects data - locally ☝️ - about how you write code and work in general. This enables the assistant to adapt to your style and provide more accurate suggestions. The promise is that the more you work with the assistant, the better it will get.

💰 …and of course, using a local AI asssistant could save the subscription fee of a commercial solution.

On the other hand, there are some aspects that could be against going with a local solution:

⌛ The performance may not be as good as cloud-based solutions. Since the models are running on your machine, many factors can affect the responsiveness of the assistant. The quality of the suggestions can vary depending on the model you choose, the resources available on your machine, and the complexity of the task.

🐢 Don’t forget that hosting an LLM model locally requires a lot of resources. These resources will be taken from other tasks running on your machine. Also, it’s impossible to run the most powerful models on a personal computer, so you might not get the same quality of suggestions as you would with a cloud-based solution.

🐛 Neither local models, nor the Continue extension is a turnkey solution. You get what you paid for - you can expect advanced features polished by very talented engineers at GitHub, OpenAI, or other big companies. For a local solution, you need to be prepared to walk the extra mile yourself to get a commercial-grade experience. Although Continue is being developed rapidly, most features should be considered as beta. This early stage also means the business model is still unclear — what features will be free, what will be paid, and how much we will need to pay.

As you might have noticed, I’m very excited about Continue and the possibility of running AI Assistants locally, even on a laptop. While I’ll keep my GitHub Copilot subscription for now, I’m looking forward to making good use of Continue in my everyday work.

What do you think? Would you consider using a local AI Assistant in your daily work? How could it help you with your projects? Let me know in the comments below!

Until next time, happy coding! 👋