Get perfect LLM outputs every time, with JSON

Jul 26, 2023

Sooner or later, every engineer working on building an AI-powered app ends up needing to incorporate the outputs of an LLM into the rest of their code. Let’s say you’re building an app to let a user chat with a database, for which you need the LLM to output SQL. If you’re like me, you’ll probably start by making up some specific syntax to surround your output, telling the LLM to use that syntax, and then parsing the output in Python.

Let’s say we are dealing with the below data, and the user asks ‘how many people have the last name Doe?’

---------------------------------------------------
| ID  |  First   |  Last    |  Birth    | Purchase |
---------------------------------------------------
| 1   |  John    |  Doe     | 1980-01-01|  500     |
| 2   |  Jane    |  Doe     | 1985-02-02|  300     |
| 3   |  Bob     |  Smith   | 1990-03-03|  100     |
| 4   |  Alice   |  Johnson | 1995-04-04|  700     |
| 5   | Charlie  |  Brown   | 2000-05-05|  200     |
---------------------------------------------------

In this scenario, you might tell the LLM:

Here is a user query: How many people have the last name Doe?

Translate the user query into SQL. When you output your sql, preface it with 'OUTPUT', like so: OUTPUT <SQL>

The LLM might then respond:

Sure, I can do that! Here you go.

OUTPUT SELECT count(*) FROM data WHERE Last = "Doe"

You would then write some code that looks for OUTPUT in the LLM response, extracts the text after this token, and turns that into your SQL string. This will probably work fine. However, this approach breaks down once you start needing to get the LLM to generate more complex outputs.

For instance, let’s say that you want to improve your data chat app. You want the LLM to choose between a few different display options for the data — say, graph, map, or table. Additionally, you want the LLM to be able to output different rendering options for the visualizations — maybe your chart should optionally be log scale, and the map needs a center coordinate at which to focus. Finally, let’s say you want the LLM to output a title for your visualization as well.

How on earth do you go about getting the LLM to output all of this in a structured form? It’s not enough to prefix with OUTPUT anymore — now you’re dealing with multiple named outputs, nested outputs, booleans, and lists. One option would be to continue inventing your own custom output format and try to get the LLM to follow that. However, there is a simpler way:

Use JSON.

JSON is one of the most common data interchange formats in the world. It supports everything we want, any LLM you’re using will know how to write it correctly, and its trivially easy to parse in code, as well.

Let’s say we’re building the more complex data app described above. The user asks: ‘how many purchases were made by people with the last name Doe in each year?’

We could then prompt our LLM as follows:

You are a data chatbot. You take in a user query, translate it to SQL, and output several other pieces of metadata that are used to visualize the output of the SQL.

Your output must be in valid JSON. Do not output anything other than the JSON.

Here is the user query to use: How many purchases were made by people with the last name Doe in each year?

First, translate the user query to sql, and add it to the JSON with the key "sql".

Next, create an object in the JSON to describe how you want the data to be visualized. The key for this object should be "visualization". In your visualization object, you must have a key "type", the value of which is either "chart", "map", or "table".

If you choose a chart and want to make it log scale, then set the "logscale" key of the visualization object to true.

If you choose a map, you should set the center coordinates of the map. To do so, set the "coordinates" field of the "visualization" object to a list of the two coordinates.

Finally, create a title for the visualization, and add it to the JSON with the key "title".

Ideally, our LLM would then play nice and output something like the following:

{
    "sql": "SELECT YEAR(Date_of_Birth) as Year, COUNT(*) as Count FROM Customers WHERE First_Name = 'John' GROUP BY YEAR(Date_of_Birth)",
    "visualization": {
        "type": "chart",
        "logscale": false
    },
    "title": "Number of People Named 'John' Born Each Year"
}

However, LLMs tend to be chatty, and you might get them outputting additional text before or after the JSON, telling you things like “Sure, I can do that.” Such text will completely break your attempts to parse the output as JSON. Even telling the LLM “only output JSON, nothing else” won’t always fix this. Instead, I like to append the following to my prompt:

Surround your JSON output with <result></result> tags.

With this, you should get a response like so:

Sure, here's some structured JSON output:

<result>
{
  "sql": "SELECT YEAR(Date_of_Birth) as Year, COUNT(*) as Count FROM customers WHERE First_Name = 'John' GROUP BY YEAR(Date_of_Birth);",
  "visualization": {
    "type": "chart",
    "logscale": false
  },
  "title": "Distribution of People Named John Born Each Year"
}
</result>

I can explain each part of the JSON if you want.

You can then trivially parse the output in python:

import json

llm_response = <your response>
json_string = llm_response.split('<result>')[1].split('</result>')[0]
parsed_json = json.loads(json_string.strip())

This method should reliably allow one to parse complex, structured outputs from an LLM every time. I use it in my own apps all the time, with no problems.

Good luck, and let me know if you have any questions!

Prompt Tricks

Discussion about this post