DataBridge Docs
  • DataBridge Docs
  • Getting Started
    • Installation
    • Quick Start
  • API Reference
    • Overview
    • Endpoints
      • Ingest
      • Search
      • Query
      • Cache
      • Response Models
  • User Guides
    • Shell
    • Document Ingestion
    • Processing Rules
    • Semantic Search
    • Completions
    • Monitoring & Observability
Powered by GitBook
On this page
  • Docker Installation
  • Prerequisites
  • Quick Start
  • Using Existing Services
  • Manual Installation
  • 1. Clone the Repository
  • 2. Setup Python Environment
  • 3. Install Dependencies
  • 4. Configure Environment
  • 5. Setup PostgreSQL (Default Database)
  • 6. Run Quick Setup
  • 7. Start the Server
  • Accessing Your DataBridge Server
  • 1. Server Access Points
  • 2. Getting Your Access URI
  • 3. Ways to Use DataBridge
  • Additional Configuration
  • MongoDB Setup
  • AWS S3 Setup
  • API Keys
  • Next Steps
  1. Getting Started

Installation

PreviousDataBridge DocsNextQuick Start

Last updated 3 months ago

This guide covers setting up a DataBridge server. If you just want to use an existing DataBridge server, see our instead.

There are two ways to set up DataBridge:

  1. (Recommended)

Docker Installation

The Docker setup is the recommended way to get started quickly with all components preconfigured.

Prerequisites

  • Docker and Docker Compose installed on your system

  • At least 10GB of free disk space (for models and data)

  • 8GB+ RAM recommended

Quick Start

  1. Clone the repository and navigate to the project directory:

git clone https://github.com/databridge-org/databridge-core.git
cd databridge-core
  1. Start all services:

docker compose up --build

This command will:

  • Build all required containers

  • Download necessary AI models (nomic-embed-text and llama3.2)

  • Initialize the PostgreSQL database with pgvector

  • Start all services

The initial setup may take 5-10 minutes depending on your internet speed.

  1. For subsequent runs:

docker compose up    # Start all services
docker compose down  # Stop all services

Using Existing Services

If you already have Ollama or PostgreSQL running on your machine, you can configure DataBridge to use these existing instances instead of starting new containers.

Using Existing Ollama

  1. Modify databridge.toml to point to your local Ollama instance:

[completion]
provider = "ollama"
model_name = "llama3.2"
base_url = "http://host.docker.internal:11434"  # Points to host machine's Ollama

[embedding]
provider = "ollama"
model_name = "nomic-embed-text"
base_url = "http://host.docker.internal:11434"  # Points to host machine's Ollama
  1. Remove the Ollama service from docker-compose.yml:

    • Delete the ollama service section

    • Remove ollama from the depends_on section of the DataBridge service

    • Remove the ollama_data volume

  2. Add host.docker.internal support (required for Linux):

services:
  databridge:
    extra_hosts:
      - "host.docker.internal:host-gateway"
  1. Start only the required services:

docker compose up postgres databridge

Make sure your local Ollama instance:

  • Is running and accessible on port 11434

  • Has the required models installed (nomic-embed-text and llama3.2)

Using Existing PostgreSQL

  1. Modify the POSTGRES_URI in your environment or docker-compose.yml:

services:
  databridge:
    environment:
      - POSTGRES_URI=postgresql+asyncpg://your_user:your_password@host.docker.internal:5432/your_db
  1. Remove the PostgreSQL service from docker-compose.yml:

    • Delete the postgres service section

    • Remove the postgres_data volume

    • Update the depends_on section of the DataBridge service

  2. Make sure your PostgreSQL instance:

    • Has pgvector extension installed

    • Is accessible from Docker containers

    • Has the necessary database and permissions set up

  3. Start only the DataBridge service:

docker compose up databridge

Manual Installation

This section covers setting up DataBridge manually if you prefer more control over the installation.

1. Clone the Repository

git clone https://github.com/databridge-org/databridge-core.git

2. Setup Python Environment

Python 3.12 is supported, but other versions may work:

cd databridge-core
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

Copy the example environment file and create your own .env:

cp .env.example .env

Then edit the .env file with your settings:

JWT_SECRET_KEY="..."  # Required in production, optional in dev mode
POSTGRES_URI="postgresql+asyncpg://postgres:postgres@localhost:5432/databridge" # Required for PostgreSQL database
MONGODB_URI="..." # Optional: Only needed if using MongoDB

UNSTRUCTURED_API_KEY="..." # Optional: Needed for parsing via unstructured API
OPENAI_API_KEY="..." # Optional: Needed for OpenAI embeddings and completions
ASSEMBLYAI_API_KEY="..." # Optional: Needed for combined parser
ANTHROPIC_API_KEY="..." # Optional: Needed for contextual parser
AWS_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage
AWS_SECRET_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage

For local development, you can enable development mode in databridge.toml:

[auth]
dev_mode = true  # Set to true to disable authentication for local development

Note: Development mode should only be used for local development and testing. Always configure proper authentication in production.

5. Setup PostgreSQL (Default Database)

If running with postgres locally:

brew install postgresql@14
brew install pgvector
brew services start postgresql@14
createdb databridge
createuser -s postgres

6. Run Quick Setup

python quick_setup.py

This script will automatically:

  • Configure your database

  • Set up your storage

  • Create the required vector index

7. Start the Server

python start_server.py

Accessing Your DataBridge Server

Once your server is running (either through Docker or manual installation), you can access it in several ways:

1. Server Access Points

  • API: http://localhost:8000

  • API Documentation: http://localhost:8000/docs

  • Health Check: http://localhost:8000/health

2. Getting Your Access URI

  1. Visit the API documentation at http://localhost:8000/docs

  2. Find and use the /local/generate_uri endpoint to generate your admin URI

  3. Save this URI - you'll need it to connect to your server

3. Ways to Use DataBridge

With your URI, you can interact with DataBridge in several ways:

Using the Shell

python shell.py <your_local_uri>

Using the Python SDK

from databridge import DataBridge
db = DataBridge("your-databridge-uri", is_local=True)

Using the UI Component

The UI provides a visual interface for prototyping and testing. To set it up:

  1. Navigate to the UI directory:

cd databridge-core/ui-component
  1. Install dependencies and start:

npm install
npm run dev

The UI will be available at http://localhost:3000. Use your generated URI to connect.

Additional Configuration

MongoDB Setup

  1. You need a MongoDB Atlas cluster with Vector Search enabled

  2. Create a database named as per your DATABRIDGE_DB setting

  3. The server will automatically create required collections and indexes

AWS S3 Setup

  1. Create an S3 bucket for document storage

  2. Create an IAM user with permissions for this bucket

  3. Use the access keys in your .env file

API Keys

  • OpenAI API key: Required if using OpenAI for embeddings

  • Unstructured API key: Required for document parsing

Next Steps

For more details on Docker setup, configuration, and troubleshooting, see our .

See the to begin using your server

Docker Guide
Quick Start Guide
Quick Start Guide
Docker Installation
Manual Installation