HOW IT WORKS
From data to answer
Documents, databases, APIs — whatever your source: chunk, embed, index, query, generate. Observability and feedback at every step.
- IngestPDF, SQL, Web, S3, Confluence, SharePoint
- EmbedOpenAI, Cohere, BGE, your own model
- IndexHNSW + metadata filter + hybrid
- RetrieveReranker + MMR + query rewriting
- GenerateGuardrails + citations + streaming
EXAMPLE QUERY
> "Why did EBITDA fall last quarter?"
→ 2.3s · 4 sources · 97% confidence
MODULES
Battle-tested for every AI architecture
RAG Pipeline
Production-grade Retrieval Augmented Generation that connects your data to LLMs. Chunking, embedding, re-ranking and hallucination control in one pipeline.
Vector Database
Query billions of embeddings in milliseconds. Native HNSW engine with full compatibility for pgvector, Pinecone and Weaviate.
Multi-LLM Routing
GPT-4, Claude, Llama, Mistral, your own models. Automatic routing by cost, latency and accuracy.
Knowledge Graph
Turn unstructured data into relational knowledge graphs. Entity extraction, linking and graph queries.
Semantic Search
Search by meaning, not keywords. Hybrid BM25 + dense retrieval with multilingual support.
Data Governance
PII masking, row-level policy, audit trail. Every prompt and response is logged.
VECTOR DATABASE
Billions of embeddings.
Millisecond queries.
HNSW + IVF-PQ + scalar quantization under the hood. Hybrid search (BM25 + dense), metadata filters and tenant isolation built-in.
p99 < 8ms
latency
10B+
vectors/cluster
99.99%
SLA
# Vector query example
from verihane import VectorClient
vc = VectorClient(index="docs-prod")
results = vc.search(
query="how to reduce churn",
embedding=embed("how to reduce churn"),
filters={"department": "growth"},
rerank=True,
k=8,
)
for r in results:
print(r.score, r.text[:80])