> For the complete documentation index, see [llms.txt](https://databridge.gitbook.io/databridge-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://databridge.gitbook.io/databridge-docs/api-reference/endpoints/search.md).

# Search

## `POST` Search Document Chunks

Search for relevant document chunks using semantic similarity. This endpoint allows you to find the most semantically similar chunks of text from your documents based on a search query.

**Parameters:**

| Field      | Type    | Required | Default | Description                        |
| ---------- | ------- | -------- | ------- | ---------------------------------- |
| query      | string  | Yes      | -       | The search query text              |
| filters    | object  | No       | null    | Metadata filters to apply          |
| k          | integer | No       | 4       | Number of chunks to return         |
| min\_score | float   | No       | 0.0     | Minimum similarity score threshold |

**Returns:** List of ChunkResult objects with:

* `content`: The chunk content
* `score`: Similarity score
* `document_id`: Source document ID
* `chunk_number`: Position in document
* `metadata`: Document metadata
* `content_type`: Document content type
* `filename`: Original filename
* `download_url`: URL to download source file (if applicable)

{% tabs %}
{% tab title="Python SDK" %}

```python
from databridge import DataBridge

db = DataBridge(uri="your-databridge-uri")

# Search for relevant chunks
chunks = db.retrieve_chunks(
    query="machine learning applications",
    filters={"category": "tech"},
    k=3,
    min_score=0.7
)

for chunk in chunks:
    print(f"\nMatch (score: {chunk.score:.2f}):")
    print(chunk.content)
    print(f"From document: {chunk.document_id}")
    if chunk.download_url:
        print(f"Download URL: {chunk.download_url}")
```

{% endtab %}

{% tab title="REST API" %}

```bash
curl -X POST "http://localhost:8000/retrieve/chunks" \
  -H "Authorization: Bearer your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning applications",
    "filters": {
        "category": "tech"
    },
    "k": 3,
    "min_score": 0.7
  }'
```

{% endtab %}
{% endtabs %}

**Response:**

{% tabs %}
{% tab title="200 Success" %}

```json
[
    {
        "content": "Machine learning is transforming...",
        "score": 0.89,
        "document_id": "doc_abc123",
        "chunk_number": 0,
        "metadata": {
            "title": "ML Overview",
            "category": "tech"
        },
        "content_type": "text/plain",
        "filename": null,
        "download_url": null
    }
]
```

{% endtab %}

{% tab title="401 Unauthorized" %}

```json
{
    "detail": "Invalid authentication credentials"
}
```

{% endtab %}

{% tab title="422 Validation Error" %}

```json
{
    "detail": [
        {
            "loc": ["body", "query"],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
}
```

{% endtab %}
{% endtabs %}

## `GET` List Documents

List accessible documents with pagination and optional filtering. Returns a list of all documents you have access to, with support for pagination and metadata filtering.

**Parameters:**

* `skip` (optional): Number of documents to skip (default: 0)
* `limit` (optional): Maximum documents to return (default: 100)
* `filters` (optional): JSON-encoded metadata filters

**Returns:** List of Document objects containing all document metadata and storage information.

{% tabs %}
{% tab title="Python SDK" %}

```python
from databridge import DataBridge

db = DataBridge(uri="your-databridge-uri")

# List documents with filters
docs = db.list_documents(
    skip=0,
    limit=10,
    filters={"category": "tech"}
)

for doc in docs:
    print(f"Document ID: {doc.external_id}")
    print(f"Title: {doc.metadata.get('title')}")
    print(f"Created: {doc.system_metadata['created_at']}")

# Get next page
next_page = db.list_documents(
    skip=10,
    limit=10,
    filters={"category": "tech"}
)
```

{% endtab %}

{% tab title="REST API" %}

```bash
curl "http://localhost:8000/documents?skip=0&limit=10&filters={\"category\":\"tech\"}" \
  -H "Authorization: Bearer your_token"
```

{% endtab %}
{% endtabs %}

**Response:**

{% tabs %}
{% tab title="200 Success" %}

```json
[
    {
        "external_id": "doc_abc123",
        "content_type": "text/plain",
        "filename": "example.txt",
        "metadata": {
            "title": "Document Title",
            "category": "tech"
        },
        "storage_info": {
            "bucket": "your-bucket-name",
            "key": "doc_abc123/example.txt"
        },
        "system_metadata": {
            "created_at": "2024-03-20T10:30:00Z",
            "updated_at": "2024-03-20T10:30:00Z",
            "version": 1
        },
        "access_control": {
            "readers": ["user_123"],
            "writers": ["user_123"],
            "admins": ["user_123"]
        },
        "chunk_ids": ["chunk_1", "chunk_2"]
    }
]
```

{% endtab %}

{% tab title="401 Unauthorized" %}

```json
{
    "detail": "Invalid authentication credentials"
}
```

{% endtab %}

{% tab title="422 Validation Error" %}

```json
{
    "detail": [
        {
            "loc": ["query", "limit"],
            "msg": "ensure this value is less than or equal to 100",
            "type": "value_error.number.not_le"
        }
    ]
}
```

{% endtab %}
{% endtabs %}

## `GET` Document

Get metadata for a specific document by its ID. Returns the complete document object including all metadata and storage information.

**Parameters:**

* `document_id`: The external ID of the document

**Returns:** Complete Document object with all metadata fields

{% tabs %}
{% tab title="Python SDK" %}

```python
from databridge import DataBridge

db = DataBridge(uri="your-databridge-uri")

# Get document by ID
doc = db.get_document("doc_abc123")
print(f"Title: {doc.metadata.get('title')}")
print(f"Created: {doc.system_metadata['created_at']}")
if doc.storage_info:
    print(f"Storage: {doc.storage_info['bucket']}/{doc.storage_info['key']}")
```

{% endtab %}

{% tab title="REST API" %}

```bash
curl "http://localhost:8000/documents/doc_abc123" \
  -H "Authorization: Bearer your_token"
```

{% endtab %}
{% endtabs %}

**Response:**

{% tabs %}
{% tab title="200 Success" %}

```json
{
    "external_id": "doc_abc123",
    "content_type": "text/plain",
    "filename": "example.txt",
    "metadata": {
        "title": "Document Title",
        "category": "tech"
    },
    "storage_info": {
        "bucket": "your-bucket-name",
        "key": "doc_abc123/example.txt"
    },
    "system_metadata": {
        "created_at": "2024-03-20T10:30:00Z",
        "updated_at": "2024-03-20T10:30:00Z",
        "version": 1
    },
    "access_control": {
        "readers": ["user_123"],
        "writers": ["user_123"],
        "admins": ["user_123"]
    },
    "chunk_ids": ["chunk_1", "chunk_2"]
}
```

{% endtab %}

{% tab title="401 Unauthorized" %}

```json
{
    "detail": "Invalid authentication credentials"
}
```

{% endtab %}

{% tab title="404 Not Found" %}

```json
{
    "detail": "Document not found"
}
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://databridge.gitbook.io/databridge-docs/api-reference/endpoints/search.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
