Unlike previous examples like Web Scrape QnA and Multiple Documents QnA, querying structured data does not require a vector database. At the high-level, this can be achieved with following steps:
Create a custom function to execute the SQL query, and get the response
Return a natural response from the executed SQL response
In this example, we are going to create a QnA chatbot that can interact with a SQL database stored in SingleStore
TL;DR
You can find the chatflow template:
1. SQL Database Schema + Example Rows
Use a Custom JS Function node to connect to SingleStore, retrieve database schema and top 3 rows.
From the research paper, it is recommended to generate a prompt with following example format:
CREATE TABLE samples (firstName varchar NOT NULL, lastName varchar)
SELECT * FROM samples LIMIT 3
firstName lastName
Stephen Tyler
Jack McGinnis
Steven Repici
You can find more on how to get the HOST, USER, PASSWORD from this guide. Once finished, click Execute:
We can now see the correct format has been generated. Next step is to bring this into Prompt Template.
2. Return a SQL query with few shot prompting
Create a new Chat Model + Prompt Template + LLMChain
Specify the following prompt in the Prompt Template:
Based on the provided SQL table schema and question below, return a SQL SELECT ALL query that would answer the user's question. For example: SELECT * FROM table WHERE id = '1'.
------------
SCHEMA: {schema}
------------
QUESTION: {question}
------------
SQL QUERY:
Since we are using 2 variables: {schema} and {question}, specify their values in Format Prompt Values:
You can provide more examples to the prompt (i.e few-shot prompting) to let the LLM learns better. Or take reference from dialect-specific prompting
Sometimes the SQL query is invalid, and we do not want to waste resources the execute an invalid SQL query. For example, if a user is asking a general question that is irrelevant to the SQL database. We can use an If Else node to route to different path.
For instance, we can perform a basic check to see if SELECT and WHERE are included in the SQL query given by the LLM.
5. Return a natural response from the executed SQL response
Create a new Chat Model + Prompt Template + LLMChain
Write the following prompt in the Prompt Template:
Based on the question, and SQL response, write a natural language response, be details as possible:
------------
QUESTION: {question}
------------
SQL RESPONSE: {sqlResponse}
------------
NATURAL LANGUAGE RESPONSE:
Specify the variables in Format Prompt Values:
Voila! Your SQL chatbot is now ready for testing!
Query
First, let's ask something related to the database.
Looking at the logs, we can see the first LLMChain is able to give us a SQL query:
Input:
Based on the provided SQL table schema and question below, return a SQL SELECT ALL query that would answer the user's question. For example: SELECT * FROM table WHERE id = '1'.\n------------\nSCHEMA: CREATE TABLE samples (id bigint(20) NOT NULL, firstName varchar(300) NOT NULL, lastName varchar(300) NOT NULL, userAddress varchar(300) NOT NULL, userState varchar(300) NOT NULL, userCode varchar(300) NOT NULL, userPostal varchar(300) NOT NULL, createdate timestamp(6) NOT NULL)\nSELECT * FROM samples LIMIT 3\nid firstName lastName userAddress userState userCode userPostal createdate\n1125899906842627 Steven Repici 14 Kingston St. Oregon NJ 5578 Thu Dec 14 2023 13:06:17 GMT+0800 (Singapore Standard Time)\n1125899906842625 John Doe 120 jefferson st. Riverside NJ 8075 Thu Dec 14 2023 13:04:32 GMT+0800 (Singapore Standard Time)\n1125899906842629 Bert Jet 9th, at Terrace plc Desert City CO 8576 Thu Dec 14 2023 13:07:11 GMT+0800 (Singapore Standard Time)\n------------\nQUESTION: what is the address of John\n------------\nSQL QUERY:
Output
SELECT userAddress FROM samples WHERE firstName ='John'
After executing the SQL query, the result is passed to the 2nd LLMChain:
Input
Based on the question, and SQL response, write a natural language response, be details as possible:\n------------\nQUESTION: what is the address of John\n------------\nSQL RESPONSE: [{\"userAddress\":\"120 jefferson st.\"}]\n------------\nNATURAL LANGUAGE RESPONSE:
Output
The address of John is 120 Jefferson St.
Now, we if ask something that is irrelevant to the SQL database, the Else route is taken.
For first LLMChain, a SQL query is generated as below:
SELECT*FROM samples LIMIT3
However, it fails the If Else check because it doesn't contains both SELECT and WHERE, hence entering the Else route that has a prompt that says:
Politely say "I'm not able to answer query"
And the final output is:
I apologize, but I'm not able to answer your query at the moment.
Conclusion
In this example, we have successfully created a SQL chatbot that can interact with your database, and is also able to handle questions that are irrelevant to database. Further improvement includes adding memory to provide conversation history.