Scrolling through your social media feed nowadays, you’re bound to come across loads of posts about Generative AI, Prompt Engineering, and the like. It can get a bit overwhelming, but there’s no denying that this field is genuinely exciting. Thanks to things like ChatGPT, Generative AI isn’t just for techies and scientists anymore – it’s for everyone.
As a software engineer, it’s also amusing to read clickbait titles like “Generative AI: the end of programming” or “AI is going to replace programmers” - if you ask me, it is not going to happen. But I also believe that AI will change the way we work.
Whenever a new invention is introduced (let’s say the microwave oven), there is a buzz about how it will make traditional jobs obsolete. In the early days - after microwave ovens became mainstream -, there was a belief that because it is so versatile and so easy to use we won’t need chefs anymore. Everyone can be a chef with a microwave oven. Certainly, it’s not true. But also, the microwave oven and any other appliance in any household made our lives easier - using them allows us to spend less time with the boring stuff, and more time with the things we love to do.
If you ask me, Generative AI, chatbots, copilots are the same. AI aids us in tackling boring and repetitive tasks, allowing us to concentrate on the big picture. While there’s concern about AI potentially replacing jobs, especially those lacking added value, adapting to this change is crucial. Simply copying code from StackOverflow may no longer suffice, but at the end of the day - it’s a good thing. The goal is to ensure that everyone has a meaningful job, and perhaps, AI can be our partner in making that a reality.
So, let’s dive into the world of Generative AI, and do some prompt engineering. Shall we?
What is Prompt Engineering?
Prompt engineering is a popular topic, but what does it mean exactly? As you need to articulate your search phrases to get usable results from Google, you need to craft your prompt well to get the desired results from your AI model. The art and science of crafting the perfect prompt is what we call prompt engineering.
Consider the following:
Human: How to make a bed?
Chatbot: You need a bed, a mattress, and a blanket. First, put the mattress on the bed, then put the blanket on the mattress. Done.
Now, that’s not very helpful… Let’s admit: we wasn’t very helpful either. Let’s try again and provide some context and guidance:
Human: You are a helpful and respectful hotel manager and I am a trainee. I want to learn how to make a bed as part of my job. Please tell me the exact steps I need to do to make a bed properly.
Chatbot: Absolutely, I’d be happy to guide you on making a bed properly. Here are the steps:
1. Prepare the Mattress: Ensure the mattress is clean and in good condition.
2. Fitted Sheet: Start with a fitted sheet. Pull it tightly over the mattress, ensuring a snug fit around the corners.
3. Flat Sheet (Optional): If you use a flat sheet, spread it evenly over the fitted sheet. Align the top edge with the top of the mattress, leaving some overhang to fold over later.
4. Blanket or Duvet: Place your blanket or duvet on top of the flat sheet. Make sure it’s centered and evenly distributed.
5. Fold and Tuck: Fold the top edge of the flat sheet and blanket or duvet down toward the foot of the bed. Tuck them under the mattress for a neat appearance.
6. Pillows: Arrange your sleeping pillows against the headboard, standing upright.
7. Pillowcases: If using pillowcases, slip them onto the sleeping pillows.
8. Decorative Touches (Optional): Add any decorative shams or cushions in front of the sleeping pillows.
9. Smooth and Adjust: Smooth out wrinkles and adjust the bedding for a tidy and inviting look.
Remember, attention to detail is key. Practice makes perfect, so feel free to ask if you have any questions or if there’s a specific aspect you’d like more guidance on. Happy learning!
Now, that’s a lot better. But as you will see, prompt engineering is more than context and instructions. Let’s move on and get our hands dirty.
Llama2 & Ollama
Photo by Joris Beugels on Unsplash
Before we start to craft our prompts, we need a decent LLM model at hand. LLM stands for Large Language Model, and it is a type of neural network that can generate text based on the given input predicting the next words in a sequence. The most popular LLMs are OpenAI’s GPT-3 (and GPT-4), Google’s Bard, Anthropic’s Claude, and last but not least: Meta’s Llama2.
In this article, we will use Llama2, which is a free and open-source LLM model. It’s relatively easy to get it up and running on your own laptop which is a huge plus for me. I always prefer learning by doing, and I always encourage myself not to be afraid of making mistakes. Having a local LLM model is a great way to experiment and learn without the burden of generating a huge bill, or risking being banned from a service by accident.
Deploying your LLM models locally is something like having Minikube for Kubernetes - instead of deploying a cluster on AWS or Azure: you run a scaled-down version of the cluster locally. Yes, the time will come, when you need to go beyond prototyping and experimenting, but until then, you can have a lot of fun with your cozy, little local LLM model.
Prepare your environment
First things first, we will use Python 3 for this experiment. I am not going to go into the details of setting up your Python environment, there are a lot of great tutorials out there. Moreover, there is a great chance that your OS already has Python 3 installed.
Come back, if you can run the following command successfully:
|
|
It should return something like this:
|
|
Great. Now, we need to have Llama2 locally available and ready to use. The easiest way to do that is to use Ollama. Unfortunately, at the time of writing Ollama is only available for Linux and macOS. This neat little tool will do the heavy lifting for us, like providing GPU acceleration (even on macOS), downloading, configuring, and serving LLM models. Also, it is supported by LangChain, which will be our weapon of choice for prompt engineering.
Installing Ollama
On a Mac, you can install Ollama with the provided installer. It’s not much more complicated on Linux either: there is an installation script, or you can follow the instructions in the documentation.
Pulling Llama2
With Ollama installed, obtaining the Llama2 model is as straightforward as pulling a Docker image. Begin by choosing the specific variant you’d like to use.
Variant | Nr. of Parameters | Size | Minimum RAM needed |
---|---|---|---|
llama2:7b | 7 billion | 3.8 GB | 16GB |
llama2:13b | 13 billion | 7.4 GB | 32GB |
llama2:70b | 70 billion | 39 GB | More than 48GB |
Personally, I wouldn’t recommend the 70 billion parameter variant as it’s overkill for prototyping. The 7b variant is more than sufficient for getting acquainted with the topic. If you’re conducting this experiment on a modest laptop, you can opt for orca-mini.
Regardless of your choice, you can retrieve the model using the following command (for orca-mini, replace llama2
with orca-mini
):
|
|
Depending on your internet connection, it might take a while. Once it is done, you can start the model with the following command:
|
|
Please note: if you’re using the GUI application, you won’t need to run the ollama serve
command.
So far, so good. Let’s write some Python code! 🐍
Prompt Engineering with LangChain
LangChain is a versatile library with multiple language bindings designed to assist you in creating prompts and interacting with LLM models. It has some really nice features to prototype your Generative AI projects. I will use the Python Library, but if JavaScript is more to your liking, you can opt for the Node.js Library instead.
A friendly reminder: I am not a seasoned Python developer, so don’t expect production-grade code quality in these examples. 🙏
Hello LangChain, Hello Llama2! 👋
Let’s start with something simple. We will use the llama2:7b
model, and ask the model to tell us the author of a book, providing a title.
|
|
If you run this code, you will get something like this:
|
|
If you run it again, you will get a very similar, but a bit different answer. That’s how LLM models work - they are not deterministic. I encourage you to try it with a different temperature
value. Higher temperature
values will result in more creative answers, but you risk getting gibberish as well. Lower temperature
values will result in more accurate answers, but they will be less creative, and less “human-like”.
Adding Personalities to the Model 👥
Photo by Brooke Cagle on Unsplash
In the first example, we utilized a straightforward prompt that lacked proper structuring, making it less efficient for real-world applications. How can we add structure to our prompts? The solution lies in using prompt templates.
Consider this example:
|
|
The interesting parts are the SystemMessagePromptTemplate
, HumanMessagePromptTemplate
, and ChatPromptTemplate
classes. They encapsulate the different aspects of our prompt.
The system message template provides context and guidance, typically by defining a persona for the model. On the other hand, the human message template represents the user’s input, usually in the form of a question or a command. Finally, the chat prompt template is about crafting a prompt using the aforementioned templates. When formatting the chat prompt itself, values for the placeholders in the templates - here the role
and book_title
variables - should be provided as parameters.
Here is an example output:
|
|
Assigning awkward personalities to our model is indeed an amusing way to experiment with prompt engineering. Now, let’s discuss some clever tricks we can use to further improve our prompt engineering skills.
Zero-Shot vs. Few-Shot Prompting
Although I never mentioned it even once, all my examples have been utilizing zero-shot prompting. Wondering what that means? Well, it’s when we ask the model to answer a question without giving it any specific examples. Given that modern LLMs are trained on extensive datasets and fine-tuned for our use, they usually provide decent answers to our queries. Now, what if you are not satisfied with the results? You can give the model some examples for the model to help it better understand what you are looking for. This is few-shot prompting (pro tip: few can be one 😎).
Let’s stick to the book title example for a while. We would like to classify books based on their genre.
Here is an example (note the ###
separator - it indicates the end of the system prompt):
|
|
Run this code multiple times, and you’ll notice that the answers consistently follow a clear pattern:
|
|
… and so on. Without providing any examples, the model will probably provide a decent answer, but it will be very inconsistent, even with a zero temperature.
Replace the last line with this:
|
|
… and you will get something like this:
|
|
Now we can see the real power of few-shot prompting. Finally, I would like to show you how to instruct an LLM model to give answers in a specific format. To achieve this, we are going to use output parsers from LangChain.
Here is the code for this example:
|
|
Here is an example output:
|
|
With the CommaseparatedListOutputParser
, you can effortlessly convert the model’s output into simple Python Lists. This streamlines the integration of LLM into your real-world applications. You won’t have to worry about understanding how to parse arbitrary responses - you can focus on your ideas instead. However, it’s crucial to note that neither output parsers nor prompt templates are one-size-fits-all solutions. As illustrated in my example, there are instances where you may need to invest effort in refining the model’s output. You might ask, is it possible to use other output formats? The answer is yes - you can parse the model’s response into formats like JSON or XML as well. Consult LangChain’s documentation for more information.
This was the last topic I wanted to cover in this very introductory article. We just scratched the surface of prompt engineering, and LangChain itself. We did not cover Agents, Chains, and routing just to name a few interesting topics. If I managed to spark your interest, I encourage you to check out some cool tutorials and examples on the internet - there are a dozen of them already.
I hope you learned something new today. Until next time! 👋
P.S.: The source code for this article’s scripts is available on GitHub. Feel free to make a few tweaks and see what unfolds. ✨