Semantic Search
This guide explains how to perform semantic search operations using DataBridge.
Basic Setup
First, install the DataBridge client:
pip install databridge-clientInitialize the client:
from databridge import DataBridge, AsyncDataBridge
# Synchronous client
db = DataBridge("your-uri")
# Asynchronous client
async_db = AsyncDataBridge("your-uri")Search Operations
DataBridge provides two main search operations:
Chunk-level Search (
retrieve_chunks): Returns individual text chunks with their relevance scoresDocument-level Search (
retrieve_docs): Returns complete documents with aggregated relevance scores
Chunk-level Search
Search for specific chunks of text:
Document-level Search
Search for complete documents:
Understanding Search Results
Two-Stage Ranking
DataBridge uses a two-stage ranking process for optimal search results:
Vector Similarity (First Stage)
Query is converted to an embedding vector
Cosine similarity is computed with document chunks
Initial ranking based on vector similarity
Fast but may miss some semantic nuances
Neural Reranking (Second Stage, Optional)
Top results from vector search are reranked
Uses a specialized neural model for scoring
More accurate but computationally intensive
Can be enabled with
use_reranking=True
Similarity Scores
Similarity scores indicate how well a chunk or document matches your query:
Score Range: 0.0 to 1.0
1.0: Perfect match
0.0: No similarity
Typical Score Ranges:
0.9 - 1.0: Near-exact semantic match
0.8 - 0.9: Very strong semantic similarity
0.7 - 0.8: Strong semantic similarity
0.6 - 0.7: Moderate semantic similarity
< 0.6: Weak semantic similarity
Example of using similarity scores:
Document Content Types
The DocumentContent type represents either a URL or direct content string:
When retrieving documents, the content field will be one of these types:
Array Operations
Existence Checks
Next Steps
After finding relevant documents or chunks:
Use the document IDs to retrieve full documents
Implement advanced filtering and sorting logic
Last updated