Tailwinds - The UI for AI
  • Welcome to Tailwinds
    • Chatflows
      • LangChain
        • Agents
          • Airtable Agent
          • AutoGPT
          • BabyAGI
          • CSV Agent
          • Conversational Agent
          • OpenAI Assistant
            • Threads
          • ReAct Agent Chat
          • ReAct Agent LLM
          • Tool Agent
          • XML Agent
        • Cache
          • InMemory Cache
          • InMemory Embedding Cache
          • Momento Cache
          • Redis Cache
          • Redis Embeddings Cache
          • Upstash Redis Cache
        • Chains
          • GET API Chain
          • OpenAPI Chain
          • POST API Chain
          • Conversation Chain
          • Conversational Retrieval QA Chain
          • LLM Chain
          • Multi Prompt Chain
          • Multi Retrieval QA Chain
          • Retrieval QA Chain
          • Sql Database Chain
          • Vectara QA Chain
          • VectorDB QA Chain
        • Chat Models
          • AWS ChatBedrock
          • Azure ChatOpenAI
          • NIBittensorChat
          • ChatAnthropic
          • ChatCohere
          • Chat Fireworks
          • ChatGoogleGenerativeAI
          • ChatGooglePaLM
          • Google VertexAI
          • ChatHuggingFace
          • ChatMistralAI
          • ChatOllama
          • ChatOllama Funtion
          • ChatOpenAI
          • ChatOpenAI Custom
          • ChatTogetherAI
          • GroqChat
        • Document Loaders
          • API Loader
          • Airtable
          • Apify Website Content Crawler
          • Cheerio Web Scraper
          • Confluence
          • Csv File
          • Custom Document Loader
          • Document Store
          • Docx File
          • Figma
          • FireCrawl
          • Folder with Files
          • GitBook
          • Github
          • Json File
          • Json Lines File
          • Notion Database
          • Notion Folder
          • Notion Page
          • PDF Files
          • Plain Text
          • Playwright Web Scraper
          • Puppeteer Web Scraper
          • AWS S3 File Loader
          • SearchApi For Web Search
          • SerpApi For Web Search
          • Spider Web Scraper/Crawler
          • Text File
          • Unstructured File Loader
          • Unstructured Folder Loader
          • VectorStore To Document
        • Embeddings
          • AWS Bedrock Embeddings
          • Azure OpenAI Embeddings
          • Cohere Embeddings
          • Google GenerativeAI Embeddings
          • Google PaLM Embeddings
          • Google VertexAI Embeddings
          • HuggingFace Inference Embeddings
          • MistralAI Embeddings
          • Ollama Embeddings
          • OpenAI Embeddings
          • OpenAI Embeddings Custom
          • TogetherAI Embedding
          • VoyageAI Embeddings
        • LLMs
          • AWS Bedrock
          • Azure OpenAI
          • NIBittensorLLM
          • Cohere
          • GooglePaLM
          • GoogleVertex AI
          • HuggingFace Inference
          • Ollama
          • OpenAI
          • Replicate
        • Memory
          • Buffer Memory
          • Buffer Window Memory
          • Conversation Summary Memory
          • Conversation Summary Buffer Memory
          • DynamoDB Chat Memory
          • MongoDB Atlas Chat Memory
          • Redis-Backed Chat Memory
          • Upstash Redis-Backed Chat Memory
        • Moderation
          • OpenAI Moderation
          • Simple Prompt Moderation
        • Output Parsers
          • CSV Output Parser
          • Custom List Output Parser
          • Structured Output Parser
          • Advanced Structured Output Parser
        • Prompts
          • Chat Prompt Template
          • Few Shot Prompt Template
          • Prompt Template
        • Record Managers
        • Retrievers
          • Cohere Rerank Retriever
          • Embeddings Filter Retriever
          • HyDE Retriever
          • LLM Filter Retriever
          • Multi Query Retriever
          • Prompt Retriever
          • Reciprocal Rank Fusion Retriever
          • Similarity Score Threshold Retriever
          • Vector Store Retriever
          • Voyage AI Rerank Retriever
        • Text Splitters
          • Character Text Splitter
          • Code Text Splitter
          • Html-To-Markdown Text Splitter
          • Markdown Text Splitter
          • Recursive Character Text Splitter
          • Token Text Splitter
        • Tools
          • BraveSearch API
          • Calculator
          • Chain Tool
          • Chatflow Tool
          • Custom Tool
          • Exa Search
          • Google Custom Search
          • OpenAPI Toolkit
          • Python Interpreter
          • Read File
          • Request Get
          • Request Post
          • Retriever Tool
          • SearchApi
          • SearXNG
          • Serp API
          • Serper
          • Web Browser
          • Write File
        • Vector Stores
          • AstraDB
          • Chroma
          • Elastic
          • Faiss
          • In-Memory Vector Store
          • Milvus
          • MongoDB Atlas
          • OpenSearch
          • Pinecone
          • Postgres
          • Qdrant
          • Redis
          • SingleStore
          • Supabase
          • Upstash Vector
          • Vectara
          • Weaviate
          • Zep Collection - Open Source
          • Zep Collection - Cloud
      • LlamaIndex
        • Agents
          • OpenAI Tool Agent
          • Anthropic Tool Agent
        • Chat Models
          • AzureChatOpenAI
          • ChatAnthropic
          • ChatMistral
          • ChatOllama
          • ChatOpenAI
          • ChatTogetherAI
          • ChatGroq
        • Embeddings
          • Azure OpenAI Embeddings
          • OpenAI Embedding
        • Engine
          • Query Engine
          • Simple Chat Engine
          • Context Chat Engine
          • Sub-Question Query Engine
        • Response Synthesizer
          • Refine
          • Compact And Refine
          • Simple Response Builder
          • Tree Summarize
        • Tools
          • Query Engine Tool
        • Vector Stores
          • Pinecone
          • SimpleStore
    • Agentflows
      • Multi-Agents (Supervisor/Worker)
      • Sequential Agents
    • API
      • Chatflows and APIs
    • Document Stores
    • Embed
      • Rate Limit
    • API Streaming
    • Analytics
    • Credentials
      • Amazon Bedrock Credential Setup
      • IBM Watsonx.AI Credential Setup
    • Variables
    • Utilities
      • Custom JS Function
      • Set/Get Variable
      • If Else
      • Sticky Note
    • Example Flows
      • Calling Children Flows
      • Calling Webhook
      • Interacting with API
      • Multiple Documents QnA
      • SQL QnA
      • Upserting Data
      • Web Scrape QnA
    • Monitoring & Auditing
      • Configuring Monitoring and Traces
    • Tailwinds Security and Deployment
  • Release Notes
    • 12/17/2024 - v2.2.1
    • 10/11/2024 - v2.1.2
    • 9/27/2024- v2.1
    • 8/16/2024 - v2.0.5
  • Demos and Use-cases
    • Create a Basic Chatbot
    • Build an AI-Powered Translator
    • Create research-powered call scripts
    • Extract information from Medical Documents
    • Identify ICD10 medical codes
  • GenAI University
    • Syllabus
    • 101-Prompt Engineering
    • 101-System Prompts
    • 101-Human (User) Prompts
    • 101-Context Window
    • 101-Prompt Chains
    • 201-Documents and Vector Databases (RAG)
    • 301-AI Agents
    • 301-Agent Tools
    • 401-Multi-Agent
Powered by GitBook
On this page
  • Setup
  • 1. Add a Document Store
  • 2. Select a Document Loader
  • 3. Preparing your data
  • 4. Preview your data
  • 5. Process your data
  • 6. Add your Document Store node to your flow
  • 7. Upsert your data to a Vector Store
  • 8. Test your RAG
  • 9. Summary
  • Key benefits of using the Document Stores

Was this helpful?

  1. Welcome to Tailwinds

Document Stores

Learn how to use the Tailwinds Document Stores

PreviousChatflows and APIsNextEmbed

Last updated 9 months ago

Was this helpful?


The Document Stores offer a versatile approach to data management, enabling you to upload, split, and prepare your data for upserting your datasets in a single location.

This centralized approach simplifies data handling and allows for efficient management of various data formats, making it easier to organize and access your data within the app.

Setup

In this tutorial, we will set up a system to retrieve information about the LibertyGuard Deluxe Homeowners Policy, a topic that LLMs are likely not extensively trained on.

Using the Document Stores, we'll prepare and upsert data about LibertyGuard and its set of home insurance policies. This will enable our RAG system to accurately answer user queries about LibertyGuard's home insurance offerings.

1. Add a Document Store

  • Start by adding a Document Store and naming it. In our case, "LibertyGuard Deluxe Homeowners Policy".

2. Select a Document Loader

3. Preparing your data

  • First, we start by uploading our PDF file.

  • Then, we add a unique metadata key. This is optional, but a good practice as it allows us to target and filter down this same dataset later on if we need to.

In this guide, we've added a generous Chunk Overlap size to ensure no relevant data gets missed between chunks. However, the optimal overlap size is dependent on the complexity of your data. You may need to adjust this value based on your specific dataset and the nature of the information you want to extract.

4. Preview your data

Note that our custom metadata company: "liberty" has been inserted into each chunk. This metadata allows us to easily filter and retrieve information from this specific dataset later on, even if we use the same vector store index for other datasets.

5. Process your data

  • Once you are satisfied with the chunking process, it's time to process your data.

Note that once you have processed your data, you will be able to edit your chunks by deleting or adding data to them. This is beneficial if:

  • You discover inaccuracies or inconsistencies in the original data: Editing chunks allows you to correct errors and ensure the information is accurate.

  • You want to refine the content for better relevance: You can adjust chunks to emphasize specific information or remove irrelevant sections.

  • You need to tailor chunks for specific queries: By editing chunks, you can make them more targeted to the types of questions you expect to receive.

6. Add your Document Store node to your flow

7. Upsert your data to a Vector Store

8. Test your RAG

  • Finally, our Retrieval-Augmented Generation (RAG) system is operational. It's noteworthy how the LLM effectively interprets the query and successfully leverages relevant information from the chunked data to construct a comprehensive response.

9. Summary

We started by creating a Document Store to organize the LibertyGuard Deluxe Homeowners Policy data. This data was then prepared by uploading, chunking, processing, and upserting it, making it ready for our RAG system.

Key benefits of using the Document Stores

  • Organization and Management: The Document Store provides a centralized location for storing, managing, and preparing our data.

  • Data Quality: The chunking process helps ensure that our data is structured in a way that facilitates accurate retrieval and analysis.

  • Flexibility: The Document Store allows us to refine and adjust our data as needed, improving the accuracy and relevance of our RAG system.

Enter the Document Store we just created and select the you want to use. In our case, since our dataset is in PDF format, we'll use the .

Finally, select the you want to use to chunk your data. In our particular case, we will use the .

We can now preview how our data will be chunked using our current configuration; chunk_size=1500and chunk_overlap=750.

It's important to experiment with different , Chunk Sizes, and Overlap values to find the optimal configuration for your specific dataset. This preview allows you to refine the chunking process and ensure that the resulting chunks are suitable for your RAG system.

Now that our dataset is ready to be upserted, it's time to go to your RAG chatflow / agentflow and add the under the LangChain > Document Loader section.

Upsert your dataset to your by clicking the green button in the right corner of your flow. We used the in our implementation.

Document Loader
PDF Loader
Text Splitter
Recursive Character Text Splitter
Text Splitter
Text Splitters
Document Store node
Vector Store
Upstash Vector Store
Retrieval Augmented Generation (RAG)