Knowledge Base & Vector Search
Unlock accurate AI answers by feeding Sentrup your actual business files. The system indexes your documents into a secure vector space and matches them to incoming customer questions in real time.
What Document Formats Are Supported by the Knowledge Base?
The Sentrup knowledge base supports multiple document formats, including raw text files (.txt), Markdown (.md), and portable document formats (.pdf). Additionally, you can ingest entire websites directly by entering their URL into our automated scraping module.
- PDF Documents: Manuals, product specifications, and policy sheets.
- Text & Markdown files:
TXTandMDfiles detailing product instructions or FAQs. - URL Web Crawls: Provide page URLs, and Sentrup will automatically scrape and index their text content. Learn more in our Website Crawling Guide.
How Does Semantic Chunking and Vector Embedding Work?
When you upload documents to your knowledge base, Sentrup splits the raw text into overlapping semantic chunks of 500-1000 characters. These text chunks are then transformed into numerical vector embeddings and stored inside a secure index in our vector database.
- Splits the text into overlapping chunks (typically 500-1000 characters) to preserve contextual boundaries.
- Generates vector embeddings using state-of-the-art embedding models, transforming text chunks into multi-dimensional numerical coordinate vectors.
- Saves coordinate vectors into a dedicated secure index in our vector database.
How to Prevent AI Hallucinations Using Vector Search?
To prevent AI hallucinations, Sentrup matches visitor questions with relevant document chunks using similarity calculations in the vector database. If no matching chunks exceed your set confidence threshold, the chatbot replies with a fallback message or triggers human escalation rather than fabricating answers.