A Gentle Introduction to Prompt Engineering

Scrolling through your social media feed nowadays, you’re bound to come across loads of posts about Generative AI, Prompt Engineering, and the like. It can get a bit overwhelming, but there’s no denying that this field is genuinely exciting. Thanks to things like ChatGPT, Generative AI isn’t just for techies and scientists anymore – it’s for everyone.

As a software engineer, it’s also amusing to read clickbait titles like “Generative AI: the end of programming” or “AI is going to replace programmers” - if you ask me, it is not going to happen. But I also believe that AI will change the way we work.

Whenever a new invention is introduced (let’s say the microwave oven), there is a buzz about how it will make traditional jobs obsolete. In the early days - after microwave ovens became mainstream -, there was a belief that because it is so versatile and so easy to use we won’t need chefs anymore. Everyone can be a chef with a microwave oven. Certainly, it’s not true. But also, the microwave oven and any other appliance in any household made our lives easier - using them allows us to spend less time with the boring stuff, and more time with the things we love to do.

If you ask me, Generative AI, chatbots, copilots are the same. AI aids us in tackling boring and repetitive tasks, allowing us to concentrate on the big picture. While there’s concern about AI potentially replacing jobs, especially those lacking added value, adapting to this change is crucial. Simply copying code from StackOverflow may no longer suffice, but at the end of the day - it’s a good thing. The goal is to ensure that everyone has a meaningful job, and perhaps, AI can be our partner in making that a reality.

So, let’s dive into the world of Generative AI, and do some prompt engineering. Shall we?

What is Prompt Engineering?

Prompt engineering is a popular topic, but what does it mean exactly? As you need to articulate your search phrases to get usable results from Google, you need to craft your prompt well to get the desired results from your AI model. The art and science of crafting the perfect prompt is what we call prompt engineering.

Consider the following:

Human: How to make a bed?
Chatbot: You need a bed, a mattress, and a blanket. First, put the mattress on the bed, then put the blanket on the mattress. Done.

Now, that’s not very helpful… Let’s admit: we wasn’t very helpful either. Let’s try again and provide some context and guidance:

Human: You are a helpful and respectful hotel manager and I am a trainee. I want to learn how to make a bed as part of my job. Please tell me the exact steps I need to do to make a bed properly.
Chatbot: Absolutely, I’d be happy to guide you on making a bed properly. Here are the steps:
1. Prepare the Mattress: Ensure the mattress is clean and in good condition.
2. Fitted Sheet: Start with a fitted sheet. Pull it tightly over the mattress, ensuring a snug fit around the corners.
3. Flat Sheet (Optional): If you use a flat sheet, spread it evenly over the fitted sheet. Align the top edge with the top of the mattress, leaving some overhang to fold over later.
4. Blanket or Duvet: Place your blanket or duvet on top of the flat sheet. Make sure it’s centered and evenly distributed.
5. Fold and Tuck: Fold the top edge of the flat sheet and blanket or duvet down toward the foot of the bed. Tuck them under the mattress for a neat appearance.
6. Pillows: Arrange your sleeping pillows against the headboard, standing upright.
7. Pillowcases: If using pillowcases, slip them onto the sleeping pillows.
8. Decorative Touches (Optional): Add any decorative shams or cushions in front of the sleeping pillows.
9. Smooth and Adjust: Smooth out wrinkles and adjust the bedding for a tidy and inviting look.
Remember, attention to detail is key. Practice makes perfect, so feel free to ask if you have any questions or if there’s a specific aspect you’d like more guidance on. Happy learning!

Now, that’s a lot better. But as you will see, prompt engineering is more than context and instructions. Let’s move on and get our hands dirty.

Llama2 & Ollama

Picture of two cute llamas Photo by Joris Beugels on Unsplash

Before we start to craft our prompts, we need a decent LLM model at hand. LLM stands for Large Language Model, and it is a type of neural network that can generate text based on the given input predicting the next words in a sequence. The most popular LLMs are OpenAI’s GPT-3 (and GPT-4), Google’s Bard, Anthropic’s Claude, and last but not least: Meta’s Llama2.

In this article, we will use Llama2, which is a free and open-source LLM model. It’s relatively easy to get it up and running on your own laptop which is a huge plus for me. I always prefer learning by doing, and I always encourage myself not to be afraid of making mistakes. Having a local LLM model is a great way to experiment and learn without the burden of generating a huge bill, or risking being banned from a service by accident.

Deploying your LLM models locally is something like having Minikube for Kubernetes - instead of deploying a cluster on AWS or Azure: you run a scaled-down version of the cluster locally. Yes, the time will come, when you need to go beyond prototyping and experimenting, but until then, you can have a lot of fun with your cozy, little local LLM model.

Prepare your environment

First things first, we will use Python 3 for this experiment. I am not going to go into the details of setting up your Python environment, there are a lot of great tutorials out there. Moreover, there is a great chance that your OS already has Python 3 installed.

Come back, if you can run the following command successfully:

1
python3 --version

It should return something like this:

1
Python 3.11.5

Great. Now, we need to have Llama2 locally available and ready to use. The easiest way to do that is to use Ollama. Unfortunately, at the time of writing Ollama is only available for Linux and macOS. This neat little tool will do the heavy lifting for us, like providing GPU acceleration (even on macOS), downloading, configuring, and serving LLM models. Also, it is supported by LangChain, which will be our weapon of choice for prompt engineering.

Installing Ollama

On a Mac, you can install Ollama with the provided installer. It’s not much more complicated on Linux either: there is an installation script, or you can follow the instructions in the documentation.

Pulling Llama2

With Ollama installed, obtaining the Llama2 model is as straightforward as pulling a Docker image. Begin by choosing the specific variant you’d like to use.

Variant	Nr. of Parameters	Size	Minimum RAM needed
`llama2:7b`	7 billion	3.8 GB	16GB
`llama2:13b`	13 billion	7.4 GB	32GB
`llama2:70b`	70 billion	39 GB	More than 48GB

Personally, I wouldn’t recommend the 70 billion parameter variant as it’s overkill for prototyping. The 7b variant is more than sufficient for getting acquainted with the topic. If you’re conducting this experiment on a modest laptop, you can opt for orca-mini.

Regardless of your choice, you can retrieve the model using the following command (for orca-mini, replace llama2 with orca-mini):

1
ollama pull llama2:7b

Depending on your internet connection, it might take a while. Once it is done, you can start the model with the following command:

1
ollama serve

Please note: if you’re using the GUI application, you won’t need to run the ollama serve command.

So far, so good. Let’s write some Python code! 🐍

Prompt Engineering with LangChain

LangChain is a versatile library with multiple language bindings designed to assist you in creating prompts and interacting with LLM models. It has some really nice features to prototype your Generative AI projects. I will use the Python Library, but if JavaScript is more to your liking, you can opt for the Node.js Library instead.

A friendly reminder: I am not a seasoned Python developer, so don’t expect production-grade code quality in these examples. 🙏

Hello LangChain, Hello Llama2! 👋

Let’s start with something simple. We will use the llama2:7b model, and ask the model to tell us the author of a book, providing a title.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# getting Ollama support from langchain
from langchain.llms import Ollama

# initializing Ollama with Llama2 model with 7b parameters
ollama = Ollama(model="llama2:7b")


# a simple function to ask the model to tell the author of a book,
# given the title of the book
def tell_author(book_title):
    prompt = f"Tell me the author of the following book: {book_title}!"
    reply = ollama(prompt, temperature=0.15)
    return reply


# asking the user for a book title
title = input("Give me a book title: ")

# storing the answer in a variable, for brevity
answer = tell_author(title)

# telling the user the answer
print(answer)

If you run this code, you will get something like this:

1
2
3
4
python3 tell_author.py
Give me a book title: Dune

The author of the book "Dune" is Frank Herbert.

If you run it again, you will get a very similar, but a bit different answer. That’s how LLM models work - they are not deterministic. I encourage you to try it with a different temperature value. Higher temperature values will result in more creative answers, but you risk getting gibberish as well. Lower temperature values will result in more accurate answers, but they will be less creative, and less “human-like”.

Adding Personalities to the Model 👥

Group of people having conversation Photo by Brooke Cagle on Unsplash

In the first example, we utilized a straightforward prompt that lacked proper structuring, making it less efficient for real-world applications. How can we add structure to our prompts? The solution lies in using prompt templates.

Consider this example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from langchain.llms import Ollama
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
)

ollama = Ollama(model="llama2:7b")


def tell_author(book_title, role="Librarian"):
    system_message = SystemMessagePromptTemplate.from_template(f"You are a {role}.")
    human_message = HumanMessagePromptTemplate.from_template(f"Who wrote {book_title}?")
    chat = ChatPromptTemplate.from_messages([system_message, human_message])
    prompt = chat.format_prompt(role=role, book_title=book_title).to_messages()

    return ollama.invoke(prompt)


title = input("Give me a book title: ")
roles = ["helpful librarian", "strict teacher", "lazy student"]

for role in roles:
    print("\nI am a", role)
    print(tell_author(title, role))

The interesting parts are the SystemMessagePromptTemplate, HumanMessagePromptTemplate, and ChatPromptTemplate classes. They encapsulate the different aspects of our prompt.

The system message template provides context and guidance, typically by defining a persona for the model. On the other hand, the human message template represents the user’s input, usually in the form of a question or a command. Finally, the chat prompt template is about crafting a prompt using the aforementioned templates. When formatting the chat prompt itself, values for the placeholders in the templates - here the role and book_title variables - should be provided as parameters.

Here is an example output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
python3 prompt_template.py
Give me a book title: DUne

I am a helpful librarian

Ah, a fellow book lover! *adjusts glasses* Dune is actually a novel 
written by Frank Herbert. It was first published in 1965 and has since 
become a classic of science fiction literature. 
The novel is set in a distant future where humans have colonized 
other planets, and it explores themes of politics, religion, 
and ecology in a complex and thought-provoking way. 
Have you read Dune before? *smiles*

I am a strict teacher

Oh, you want to know about that ridiculous book "DUne"? 
(sighs) Well, it was written by a certain... 
how do I put this... absolute moron named Frank Herbert. (eye roll) 
Yes, that's right, the man who thought it was a good idea 
to write a novel that's longer than most novels have words in them. (shudders)

Now, I know you may be thinking, "But, Teacher, 'DUne' is a classic!" 
And to that, I say... (scornful look) 
Classic? Ha! It's a slog through a desert of unnecessary 
detail and pretentious language. (rolls eyes) 
The characters are dull and uninteresting, 
the plot is convoluted and nonsensical, 
and don't even get me started on the "deep" themes and symbolism. 
(eye roll)

No, no, my dear student. You should be reading something with actual substance, like... 
I don't know... a real book. (smirks) Not this drivel. (waves hand dismissively)

I am a lazy student

Ugh, great. Another question. Can't you see I'm busy reclining on my couch, 
binge-watching anime? *sigh* 
Okay, okay. Dune was written by Frank Herbert. 
There, are you happy now? Can I go back to my Netflix marathon?

Assigning awkward personalities to our model is indeed an amusing way to experiment with prompt engineering. Now, let’s discuss some clever tricks we can use to further improve our prompt engineering skills.

Zero-Shot vs. Few-Shot Prompting

Although I never mentioned it even once, all my examples have been utilizing zero-shot prompting. Wondering what that means? Well, it’s when we ask the model to answer a question without giving it any specific examples. Given that modern LLMs are trained on extensive datasets and fine-tuned for our use, they usually provide decent answers to our queries. Now, what if you are not satisfied with the results? You can give the model some examples for the model to help it better understand what you are looking for. This is few-shot prompting (pro tip: few can be one 😎).

Let’s stick to the book title example for a while. We would like to classify books based on their genre.

Here is an example (note the ### separator - it indicates the end of the system prompt):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from langchain.llms import Ollama

ollama = Ollama(model="llama2:7b", temperature=0)


def classify_book(book_title):
    system_prompt = """
    'Pride and Prejudice' = Romance
    'The Lord of the Rings' = Fantasy
    'The Great Gatsby' = Fiction
    '1984' = Dystopian
    ###
    """
    return ollama.invoke(f"{system_prompt}\n'{book_title}' = ")


input_title = input("Give me a book title: ")
print(classify_book(input_title))

Run this code multiple times, and you’ll notice that the answers consistently follow a clear pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
python3 classify_books.py
Give me a book title: Anna Karenina

'Anna Karenina' = Romance

python3 classify_books.py
Give me a book title: War and Peace

'War and Peace' = Historical Fiction

python3 classify_books.py
Give me a book title: Catch-22

'Catch-22' = Satire

… and so on. Without providing any examples, the model will probably provide a decent answer, but it will be very inconsistent, even with a zero temperature.

Replace the last line with this:

1
print(ollama(f"Classify this book: {input_title}"))

… and you will get something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Certainly! Based on its themes, style, and literary devices, I would classify "Catch-22" by Joseph Heller as a:

Absurdist novel: The book is known for its absurd and illogical events, characters' experiences, and the author's use of satire to critique societal norms and expectations.

War novel: Set during World War II, "Catch-22" explores the themes of war, violence, and the human condition through the eyes of its protagonist, Yossarian.

Satirical novel: Heller uses satire to comment on various aspects of society, including politics, bureaucracy, and military culture. The book's characters are often depicted as being trapped in a system that is oppressive and illogical.

Coming-of-age story: Yossarian's journey from naivety to disillusionment with the world around him is a central theme of the novel, making it a coming-of-age story.

Postmodern novel: "Catch-22" challenges traditional narrative structures and conventions, blurring the lines between reality and fiction, and questioning the nature of truth and meaning.

Military novel: Set in the military context of World War II, the book explores the psychological and emotional effects of war on soldiers, as well as the bureaucratic and political machinations that govern their lives.

Now we can see the real power of few-shot prompting. Finally, I would like to show you how to instruct an LLM model to give answers in a specific format. To achieve this, we are going to use output parsers from LangChain.

Here is the code for this example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from langchain.llms import Ollama
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import (
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    ChatPromptTemplate,
)

ollama = Ollama(model="llama2:7b")
ollama.temperature = 0


def list_cities(country):
    csv_parser = CommaSeparatedListOutputParser()
    system_message = SystemMessagePromptTemplate.from_template(
        """
        Answer with a list of cities only. Keep anything else out of your answer.
        The order does not matter, but the list must contain at least 5 cities.
        If you don't know the answer, just write "N/A".
        """
    )

    human_message = HumanMessagePromptTemplate.from_template(
        "Top tourist destinations in: {country}.\n{format_instructions}"
    )

    chat = ChatPromptTemplate.from_messages([system_message, human_message])
    prompt = chat.format_prompt(
        country=country,
        format_instructions=csv_parser.get_format_instructions(),
    ).to_messages()

    return csv_parser.parse(ollama.invoke(prompt))


# Llama2 is not very good at this task, so we need to clean any extra text
# from the output.
def clean_cities(cities):
    cleaned_cities = []
    for city in cities:
        city = city.split("\n")[-1]
        cleaned_cities.append(city)
    return cleaned_cities


country = input("Country: ")
cities = list_cities(country)
print(clean_cities(cities))

Here is an example output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
python3 cities_csv.py
Country: United Kingdom
['London', 'Edinburgh', 'Manchester', 'Birmingham', 'Liverpool']

python3 cities_csv.py
Country: France
['Nice', 'Paris', 'Lyon', 'Bordeaux', 'Marseille']

python3 cities_csv.py
Country: Holland
['Amsterdam', 'Rotterdam', 'The Hague', 'Utrecht', 'Leiden']

python3 cities_csv.py
Country: Narnia
['N/A']

With the CommaseparatedListOutputParser, you can effortlessly convert the model’s output into simple Python Lists. This streamlines the integration of LLM into your real-world applications. You won’t have to worry about understanding how to parse arbitrary responses - you can focus on your ideas instead. However, it’s crucial to note that neither output parsers nor prompt templates are one-size-fits-all solutions. As illustrated in my example, there are instances where you may need to invest effort in refining the model’s output. You might ask, is it possible to use other output formats? The answer is yes - you can parse the model’s response into formats like JSON or XML as well. Consult LangChain’s documentation for more information.

This was the last topic I wanted to cover in this very introductory article. We just scratched the surface of prompt engineering, and LangChain itself. We did not cover Agents, Chains, and routing just to name a few interesting topics. If I managed to spark your interest, I encourage you to check out some cool tutorials and examples on the internet - there are a dozen of them already.

I hope you learned something new today. Until next time! 👋

P.S.: The source code for this article’s scripts is available on GitHub. Feel free to make a few tweaks and see what unfolds. ✨