Semantic Search

This guide explains how to perform semantic search operations using DataBridge.

Basic Setup

First, install the DataBridge client:

pip install databridge-client

Initialize the client:

from databridge import DataBridge, AsyncDataBridge

# Synchronous client
db = DataBridge("your-uri")

# Asynchronous client
async_db = AsyncDataBridge("your-uri")

Search Operations

DataBridge provides two main search operations:

  1. Chunk-level Search (retrieve_chunks): Returns individual text chunks with their relevance scores

  2. Document-level Search (retrieve_docs): Returns complete documents with aggregated relevance scores

Search for specific chunks of text:

Search for complete documents:

Understanding Search Results

Two-Stage Ranking

DataBridge uses a two-stage ranking process for optimal search results:

  1. Vector Similarity (First Stage)

    • Query is converted to an embedding vector

    • Cosine similarity is computed with document chunks

    • Initial ranking based on vector similarity

    • Fast but may miss some semantic nuances

  2. Neural Reranking (Second Stage, Optional)

    • Top results from vector search are reranked

    • Uses a specialized neural model for scoring

    • More accurate but computationally intensive

    • Can be enabled with use_reranking=True

Similarity Scores

Similarity scores indicate how well a chunk or document matches your query:

  • Score Range: 0.0 to 1.0

    • 1.0: Perfect match

    • 0.0: No similarity

  • Typical Score Ranges:

    • 0.9 - 1.0: Near-exact semantic match

    • 0.8 - 0.9: Very strong semantic similarity

    • 0.7 - 0.8: Strong semantic similarity

    • 0.6 - 0.7: Moderate semantic similarity

    • < 0.6: Weak semantic similarity

Example of using similarity scores:

Document Content Types

The DocumentContent type represents either a URL or direct content string:

When retrieving documents, the content field will be one of these types:

Array Operations

Existence Checks

Next Steps

After finding relevant documents or chunks:

Last updated