Efficient Few-Shot Prompting in LangChain: Output Parsers-Part 3

Jayant Pal
5 min readMar 3, 2024

--

Photo by Árpád Czapp on Unsplash

Welcome to the third and final article in this series. Hope this series of articles helped you build an understanding of Prompting in LangChain. In case you missed it, here are the links of first and second article.

In this article, we will look into different types of Output Parsers in LangChain that helps to parse the output in a specified format, either pre-defined or customized.

Output Parsers

Output Parsers are generally used to take the output of a LLM and transform it to a more suitable format. This is especially more important when we are using LLMs to generate any form of structured data.

One of the key advantages that output parsers give us is that we don’t have to manually write the format of output we want from LLM. Output Parsers does the job for us.

There are many different types of Output Parsers available in LangChain, namely:

  • CSV Parser: This parser can be used to parse LLM output to a list of comma-separated items.
  • Datetime Parser: This can be used to parse the output into a datetime format.
  • Output-Fixing Parser: This acts as a wrapper around another parser. In case the first parser fails, this tries to fix the output.
  • Structured Output Parser: This is useful when we want to parse thr output in a customized format.
  • Pydantic Parser: This allows us to parse the output to a Pydantic schema. Pydantic is a data validation and parsing library in Python.

Besides these, there are other parsers such as JSON Parser,Pandas DataFrame Parser, Enum parser, XML Parser, YAML Parser, etc that are available in LangChain. We will specifically look into the parsers mentioned above.

Comma Separated List Output Parser

# creating an instance of CSV Parser class
from langchain.output_parsers import CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()
output_parser.get_format_instructions()
# 'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

Now, let’s use prompt templating and send human message and instructions to the model. This is the same format that we have used before but the only difference is that we are adding format_instructions in the human template.

from langchain.prompts.chat import HumanMessagePromptTemplate, ChatPromptTemplate

human_template = "{human_message}\n{format_instructions}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([human_message_prompt])
prompt = chat_prompt.format_prompt(human_message = "What are the 7 continents?", format_instructions = output_parser.get_format_instructions())

# this is how the final prompt looks like:
# ChatPromptValue(messages=[HumanMessage(content='What are the 7 continents?\nYour response should be a list of comma separated values, eg: `foo, bar, baz`')])

Getting the response from model.

response = chat(messages = prompt.messages)
response.content
# 'North America, South America, Europe, Asia, Africa, Australia, Antarctica'

# parsing the output
output_parser.parse(response.content)

# ['North America','South America', 'Europe', 'Asia', 'Africa', 'Australia', 'Antarctica']

Output-Fixing Parser

Sometimes, the parser may fail as the format of output and instructions may not match. Let’s look at an example.

# importing datetime parser
from langchain.output_parsers import DatetimeOutputParser
output_parser = DatetimeOutputParser()
format_instructions = output_parser.get_format_instructions()
format_instructions

#"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 1803-06-30T22:51:22.457039Z, 0048-01-22T06:08:05.341079Z, 1633-01-03T21:27:10.518814Z\n\nReturn ONLY this string, no other words!"

Getting the response.

human_template = "{human_message}\n{format_instructions}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([human_message_prompt])

prompt = chat_prompt.format_prompt(human_message = 'When was christ born? ', format_instructions = format_instructions)

response = chat(messages = prompt.messages)

output_parser.parse(response.content) # will get an error

#OutputParserException: Could not parse datetime string: 0000-12-25T00:00:00.000000Z

To fix this, let’s use Output Fixing Parser. We can try multiple times for the parser to work.

from langchain.output_parsers import OutputFixingParser
fixing_parser = OutputFixingParser.from_llm(parser = output_parser, llm = chat)
for chance in range(1,10):
try:
fixed_output = fixing_parser.parse(response.content)
except:
continue
else:
break
fixed_output

# datetime.datetime(1, 1, 1, 0, 0)

Structured Output Parser

To use this parser, we need to define a Response Schema, which specifies the format in which we want the parse the output.

# define response schema
from langchain.output_parsers import ResponseSchema

response_schema = [
ResponseSchema(name = 'answer', description="answer to user's question"),
ResponseSchema(name = 'source',description="source used to answer to user's question, should be a website")]

The output here would be like a JSON format where keys will be the name and values will be description of the names.

Let’s define the output parser and format instructions.

# define output parser
from langchain.output_parsers import StructuredOutputParser
output_parser = StructuredOutputParser.from_response_schemas(response_schema)

format_instructions = output_parser.get_format_instructions()
format_instructions

#'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"answer": string // answer to user\'s question\n\t"source": string // source used to answer to user\'s question, should be a website\n}\n```'

Getting the response.

# get response
human_template = "{human_message}\n{format_instructions}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([human_message_prompt])

prompt = chat_prompt.format_prompt(human_message = "What is the largest animal in the world? ", format_instructions = format_instructions)

response = chat(messages = prompt.messages)

# parsing the response
output_parser.parse(response.content)

# {'answer': 'Blue Whale','source': 'https://www.nationalgeographic.com/animals/mammals/b/blue-whale/'}

Pydantic Output Parser

Conventional methods such as classes in Python do not have very strict data type validation. This is where Pydantic comes into picture.

To use Pydantic, we need to define a BaseClass. This is similar to Python class, but with actual type checking + coercion. We define the output, type of output and description, similar to Response Schema.

from pydantic import BaseModel, Field
from typing import List

class Car(BaseModel):
name : str = Field(description="Name of the car")
model_number : str = Field(description="model number of the car")
features : List[str] = Field(description="List of features of the car")
source: str = Field(description = 'Source of the answer. Should only contain website')

Defining the parser.

# output parser
from langchain.output_parsers import PydanticOutputParser
output_parser = PydanticOutputParser(pydantic_object=Car)
output_parser.get_format_instructions()
# 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"title": "Name", "description": "Name of the car", "type": "string"}, "model_number": {"title": "Model Number", "description": "model number of the car", "type": "string"}, "features": {"title": "Features", "description": "List of features of the car", "type": "array", "items": {"type": "string"}}, "source": {"title": "Source", "description": "Source of the answer. Should only contain website", "type": "string"}}, "required": ["name", "model_number", "features", "source"]}\n```'

Finally, getting the response from the model.

human_template = "{human_message}\n{format_instructions}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([human_message_prompt])
prompt = chat_prompt.format_prompt(human_message='Tell me about the most expensive car in the world',
format_instructions=output_parser.get_format_instructions())

response = chat(messages=prompt.to_messages())
output = output_parser.parse(response.content)

output
#Car(name='Bugatti La Voiture Noire', model_number='La Voiture Noire', features=['8.0-liter quad-turbo W-16 engine', '1500 horsepower', 'Top speed of 261 mph', 'Only one unit produced', 'Handcrafted bodywork', 'Luxurious interior'], source='https://www.caranddriver.com/news/a27053025/bugatti-la-voiture-noire-most-expensive-car-sold/')

That’s all about Output Parsers in LangChain. For more details, please refer to Official LangChain documentation mentioned in References. Hope you enjoyed and learned about Few Shot Prompting from this series of articles.

The entire code can be found in this GitHub link. Let me know if you have any questions or suggestions.

Connect with me on LinkedIn: Jayanta Kumar Pal

Thank you for your time! Keep Learning!

References:

https://python.langchain.com/docs/modules/model_io/output_parsers

--

--

Jayant Pal
Jayant Pal

Written by Jayant Pal

Data Scientist @ Euromonitor | Learner | Investor | Ardent Sports Fan | Github: https://github.com/jayantkp

Responses (1)