Creating a Chatbot REST API to answer questions about your custom data using Python, LangChain, Flask, and OpenAI

9 min readMay 19, 2023

If you don’t want any explanation and you just want to grab the code, skip to the “Adding model to REST API” section below

This article will guide you through the process of creating a Chatbot REST API that can answer questions about your custom data (eg: text documents) using Python, LangChain, Flask, and OpenAI.

The article assumes that you have a basic understanding of Python, LLMs (Large Language Models), REST APIs, and OpenAI. If you are unfamiliar with these technologies, we recommend that you review the provided links before continuing with this article.

While we won’t go into a deep dive into the LangChain framework, we will provide a basic understanding and demonstrate the relevant code. It’s worth noting that LangChain is a complex and powerful tool, and a more comprehensive exploration of it could fill a whole book. However, our goal is to provide you with a simple explanation and practical examples that you can use to create your Chatbot API.

Installing dependencies

To execute any of the following code, you will need to first install the necessary dependencies. This article assumes that you already have Python and pip installed on your system.

pip install openapi
pip install langchain
pip install chromadb   # if Chroma is used as the Vector Store (more info later in this article)
pip install redis      # if Redis  is used as the Vector Store (more info later in this article)

To proceed with this article, you will also require an OPENAI_API_KEY. If you don’t already have one, you can obtain an API key by following this link. Once you have created an API key, you will need to export it to the subsequent Python scripts in the following manner.

os.getenv("OPENAI_API_KEY") = "sk-..."

Alternatively, you may run the Python scripts via your command line by executing the following command:

OPENAI_API_KEY=sk-... python app.py

Exporting and loading your custom data

Since every company or individual may have a distinct use case for exporting their data, we won’t delve into the specifics of how to export your own data. Rather, we will showcase various methods to load your data into an LLM.

We will be using OpenAI’s LLMs, but you could use any other LLMs available on LangChain. Check the list of all available LLMs on LangChain here.

LangChain incorporates the concept of Document Loaders to import custom data into an LLM that will be utilized by your Chatbot. The subsequent sections will provide instructions on how to load your data in several different ways.

1. Loading a single CSV file

The following example demonstrates how to load a basic CSV file.

from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./data/your_own_data.csv')
document = loader.load()

2. Loading a single JSON file

The following example illustrates how to load a singular JSON file.

from langchain.document_loaders import JSONLoader

loader = JSONLoader(file_path='./data/your_own_data.json')
document = loader.load()

3. Loading multiple CSV files

The following example demonstrates how to load multiple CSV files.

from langchain.document_loaders import CSVLoader, DirectoryLoader


loader = DirectoryLoader('./data', glob="**/*.csv", loader_cls=CSVLoader)
documents = loader.load()

Explore all available options for loading your data here.

Storing your custom data

Thus far, we have merely shown you how to load your data into your application, but now you must store the data in a format that can be easily searchable by OpenAI’s LLMs.

While querying, you could ask the language model to examine your entire custom data without any data transformation. Nevertheless, this approach will become unscalable as your data grows. Therefore, you must convert and save your data by following three steps:

1. Split your documents into chunks

The first step is to split the raw documents into smaller chunks. This allows the LLM to better understand the context of the text, speeds up the queries, and ultimately can improve the accuracy of predictions.

Check the code below to split documents using LangChain:

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

Check more information about text splitting here.

2. Transform your split documents into embeddings

After the documents have been split into smaller chunks, the next step is to convert them into embeddings, which are numerical representations of the text. This method takes the text input and generates a numerical vector that represents the meaning of the text.

That’s a much more complex subject that we won’t be explaining deeply. Luckily, OpenAI has an API to transform documents into embeddings. So, we don’t need to do this manually.

The code below does not create the embeddings yet. However, the embeddings object will be later used to finally create embeddings for the documents.

from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

As you could check above, OpenAI uses an AI model to transform documents into embeddings. You can see all options for embedding models here.

3. Store your documents and embeddings in a Vector Store

The final step involves storing the documents and their embeddings in a Vector Store, which serves as a high-performance database enabling efficient document searching.

There are two primary methods for storing the documents:

3.1. Using Memory (not recommended, but easier)

Storing embeddings in memory is a simpler approach, but it is not recommended due to the data loss. When the Python program is shut down, all data stored in the Vector Store will be lost. However, this method is quicker to build as it doesn’t require any additional components to run alongside the Python program.

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents, embeddings)

3.2. Using Storage (recommended, but it requires Redis Stack)

Storing embeddings in storage is the recommended approach as the data remains in the Vector Store even if the Python program is shut down. However, this method is slower to build since it requires running Redis Stack (or other available databases) alongside the Python program.

from langchain.vectorstores.redis import Redis as RedisVectorStore

index_name = "content"
vectorstore = None
redis_url = "redis://localhost:6379"
create_embeddings_index = True

if create_embeddings_index:
    vectorstore = RedisVectorStore.from_documents(documents, embeddings, redis_url=redis_url, index_name=index_name)
else:
    vectorstore = RedisVectorStore.from_existing_index(embeddings, redis_url=redis_url, index_name=index_name)

The create_embeddings_index parameter determines whether embeddings will be generated for the data, which incurs expenses by accessing OpenAI’s API. It is unnecessary to create embeddings every time the application starts if the documents remain unchanged.

Therefore, it is recommended to initially start the Python application with create_embeddings_index=True and subsequently disable it by setting create_embeddings_index=False as long as the documents remain the same. This helps avoid unnecessary expenses and improves the application's performance.

For more information on Vector Stores, refer to this resource.

Building your model

In order to facilitate the Chatbot’s ability to answer questions, an AI model is required. OpenAI offers a range of models with varying costs and benefits. For more information on the available models and their differences, you can refer to this resource.

To provide a comprehensive understanding, we will break down this section into separate steps:

1. Prompt Templates

LangChain uses the concept of Prompt Templates to instruct the model on how to generate responses. This allows you to define specific behaviors for the Chatbot.

Take a look at the example below, which showcases the usage of Prompt Templates:

from langchain.prompts import PromptTemplate

TEMPLATE = """
Follow the following rules to answer the question:
- You must act like a store attendant providing information and suggestions.
- You must only provide information about e-commerce, such as products, categories, product suggestions, and product details.
- If you don't know the answer, just say that you don't know, don't try to make up an answer.
- You always must answer only the last question, but also considering the chat history.

Chat History: {chat_history}

Question: {question}
Answer:"""

prompt = PromptTemplate.from_template(TEMPLATE)

2. Temperature

As described in OpenAI’s documentation found here, the temperature parameter influences the randomness of the model’s output.

Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Since a Chatbot typically requires coherent and relevant answers, it is recommended to use a lower temperature setting to reduce randomness.

Below, you will find an example demonstrating the initialization of an OpenAI LLM using the default model for this specific use case (text-davinci-003):

from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", temperature=0)

3. Chain

The concept of Chain introduced by LangChain plays a crucial role in building AI applications. Chains can be utilized individually or combined to construct sophisticated AI solutions.

Take a look at the following example to better understand how Chains are used:

from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.chains.question_answering import load_qa_chain

question_generator = LLMChain(llm=llm, prompt=prompt)
doc_chain = load_qa_chain(llm)

chain = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
)

Asking questions to your model

Before asking questions to your newly constructed model, it is recommended to provide the user’s chat history along with your questions. By doing so, the Chatbot can generate responses that take the conversation context into account.

The LangChain framework addresses the management of chat history and context through the concept of Memory.

Refer to the instructions below on how to utilize the memory feature:

from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

chat_history.add_user_message(previous_question)
chat_history.add_ai_message(previous_answer)

messages = chat_history.messages

Once you have completed the model construction and incorporated the chat history, you are ready to begin asking questions. The procedure is outlined as follows:

result = chain({
    "question": "Can you tell me more about products designed to go to the beach?",
    "chat_history": chat_history.messages,
})

answer = result["answer"]

Creating a REST API

Flask, an incredibly popular web framework in Python, is widely recognized for its effectiveness in building REST APIs. It offers a user-friendly and speedy approach, making it an ideal choice for our API development.

Let’s take a look at an example of how Flask can be utilized to create an API that responds with pong when receiving a POST request at the /ping endpoint:

from flask import Flask

app = Flask(__name__)

@app.route('/ping', methods=['POST'])
def ping():
    return 'pong'

if __name__ == '__main__':
    app.run()

To start the API above, you just need to run the following command:

python app.py

For testing this simple REST API application, you have the option to run a curl command:

curl -X POST http://127.0.0.1:5000/ping

Or you can use a tool like Postman for convenience:

Adding model to REST API

(this section is also known as TLDR)

Finally, building a REST API using Flask with the recently constructed model is as straightforward as implementing the following code:

from langchain.document_loaders import CSVLoader, DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ChatMessageHistory

from flask import Flask, request, jsonify

TEMPLATE = """
Follow the following rules to answer the question:
- You must act like a store attendant providing information and suggestions.
- You must only provide information about e-commerce, such as products, categories, product suggestions, and product details.
- If you don't know the answer, just say that you don't know, don't try to make up an answer.
- You always must answer only the last question, but also considering the chat history.

Chat History: {chat_history}

Question: {question}
Answer:"""

loader = DirectoryLoader('./data', glob="**/*.csv", loader_cls=CSVLoader)
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")


vectorstore = Chroma.from_documents(documents, embeddings)

prompt = PromptTemplate.from_template(TEMPLATE)

llm = OpenAI(model_name="text-davinci-003", temperature=0)
question_generator = LLMChain(llm=llm, prompt=prompt)
doc_chain = load_qa_chain(llm)

chain = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
)


chat_history = ChatMessageHistory()

app = Flask(__name__)


@app.route('/chat', methods=['POST'])
def chat():
    data = request.json

    question = data.get("question")
    result = chain({
        "question": question,
        "chat_history": chat_history.messages,
    })

    answer = result["answer"]

    chat_history.add_user_message(question)
    chat_history.add_ai_message(answer)

    return jsonify({"answer": answer})


if __name__ == '__main__':
    app.run()

To execute the application, as illustrated above, utilize the following command:

OPENAI_API_KEY=<OPENAI_API_KEY> python app.py

For testing this application, you have the option to run a curl command:

curl --location --request POST 'http://127.0.0.1:5000/chat' \
--header 'Content-Type: application/json' \
--data-raw '{
    "question": "Can you tell me more about products designed to go to the beach?"
}'

Or you can use a tool like Postman for convenience:

Costs

If you are curious about how much it could cost to use OpenAI’s API, check an estimation of costs below:

Creating embeddings (using text-embedding-ada-002 model ): ~$2.50
(CSV file of ~34K lines with around 70 words per line)
Asking a question (using text-davinci-003 model): ~$ 0.01
(around 120 words counting prompt and question)

You can check OpenAI’s pricing page here.

Conclusion

This article aimed to provide a clear and concise guide on creating a Chatbot REST API to answer questions regarding your custom data using specific technologies. However, it’s important to note that there are numerous alternative technologies and approaches that may better suit your specific use case.

If this article doesn’t address your particular requirements or requires further adjustments, I recommend visiting LangChain’s website. There, you can explore a wide range of AI applications and find alternative solutions to meet your needs.

I would like to extend a special appreciation to the exceptional team at OpenAI who developed these remarkable AI technologies. ChatGPT not only assisted in creating this Chatbot, but also played a significant role in crafting this article.