Installation

This guide covers setting up a DataBridge server. If you just want to use an existing DataBridge server, see our Quick Start Guide instead.

There are two ways to set up DataBridge:

Docker Installation (Recommended)
Manual Installation

Docker Installation

The Docker setup is the recommended way to get started quickly with all components preconfigured.

Prerequisites

Docker and Docker Compose installed on your system
At least 10GB of free disk space (for models and data)
8GB+ RAM recommended

Quick Start

Clone the repository and navigate to the project directory:

git clone https://github.com/databridge-org/databridge-core.git
cd databridge-core

Start all services:

docker compose up --build

This command will:

Build all required containers
Download necessary AI models (nomic-embed-text and llama3.2)
Initialize the PostgreSQL database with pgvector
Start all services

The initial setup may take 5-10 minutes depending on your internet speed.

For subsequent runs:

docker compose up    # Start all services
docker compose down  # Stop all services

For more details on Docker setup, configuration, and troubleshooting, see our Docker Guide.

Using Existing Services

If you already have Ollama or PostgreSQL running on your machine, you can configure DataBridge to use these existing instances instead of starting new containers.

Using Existing Ollama

Modify databridge.toml to point to your local Ollama instance:

[completion]
provider = "ollama"
model_name = "llama3.2"
base_url = "http://host.docker.internal:11434"  # Points to host machine's Ollama

[embedding]
provider = "ollama"
model_name = "nomic-embed-text"
base_url = "http://host.docker.internal:11434"  # Points to host machine's Ollama

Remove the Ollama service from docker-compose.yml:
- Delete the ollama service section
- Remove ollama from the depends_on section of the DataBridge service
- Remove the ollama_data volume
Add host.docker.internal support (required for Linux):

services:
  databridge:
    extra_hosts:
      - "host.docker.internal:host-gateway"

Start only the required services:

docker compose up postgres databridge

Make sure your local Ollama instance:

Is running and accessible on port 11434
Has the required models installed (nomic-embed-text and llama3.2)

Using Existing PostgreSQL

Modify the POSTGRES_URI in your environment or docker-compose.yml:

services:
  databridge:
    environment:
      - POSTGRES_URI=postgresql+asyncpg://your_user:your_password@host.docker.internal:5432/your_db

Remove the PostgreSQL service from docker-compose.yml:
- Delete the postgres service section
- Remove the postgres_data volume
- Update the depends_on section of the DataBridge service
Make sure your PostgreSQL instance:
- Has pgvector extension installed
- Is accessible from Docker containers
- Has the necessary database and permissions set up
Start only the DataBridge service:

docker compose up databridge

Manual Installation

This section covers setting up DataBridge manually if you prefer more control over the installation.

1. Clone the Repository

git clone https://github.com/databridge-org/databridge-core.git

2. Setup Python Environment

Python 3.12 is supported, but other versions may work:

cd databridge-core
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

Copy the example environment file and create your own .env:

cp .env.example .env

Then edit the .env file with your settings:

JWT_SECRET_KEY="..."  # Required in production, optional in dev mode
POSTGRES_URI="postgresql+asyncpg://postgres:postgres@localhost:5432/databridge" # Required for PostgreSQL database
MONGODB_URI="..." # Optional: Only needed if using MongoDB

UNSTRUCTURED_API_KEY="..." # Optional: Needed for parsing via unstructured API
OPENAI_API_KEY="..." # Optional: Needed for OpenAI embeddings and completions
ASSEMBLYAI_API_KEY="..." # Optional: Needed for combined parser
ANTHROPIC_API_KEY="..." # Optional: Needed for contextual parser
AWS_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage
AWS_SECRET_ACCESS_KEY="..." # Optional: Needed for AWS S3 storage

For local development, you can enable development mode in databridge.toml:

[auth]
dev_mode = true  # Set to true to disable authentication for local development

Note: Development mode should only be used for local development and testing. Always configure proper authentication in production.

5. Setup PostgreSQL (Default Database)

If running with postgres locally:

brew install postgresql@14
brew install pgvector
brew services start postgresql@14
createdb databridge
createuser -s postgres

6. Run Quick Setup

python quick_setup.py

This script will automatically:

Configure your database
Set up your storage
Create the required vector index

7. Start the Server

python start_server.py

Accessing Your DataBridge Server

Once your server is running (either through Docker or manual installation), you can access it in several ways:

1. Server Access Points

API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

2. Getting Your Access URI

Visit the API documentation at http://localhost:8000/docs
Find and use the /local/generate_uri endpoint to generate your admin URI
Save this URI - you'll need it to connect to your server

3. Ways to Use DataBridge

With your URI, you can interact with DataBridge in several ways:

Using the Shell

python shell.py <your_local_uri>

Using the Python SDK

from databridge import DataBridge
db = DataBridge("your-databridge-uri", is_local=True)

Using the UI Component

The UI provides a visual interface for prototyping and testing. To set it up:

Navigate to the UI directory:

cd databridge-core/ui-component

Install dependencies and start:

npm install
npm run dev

The UI will be available at http://localhost:3000. Use your generated URI to connect.

Additional Configuration

MongoDB Setup

You need a MongoDB Atlas cluster with Vector Search enabled
Create a database named as per your DATABRIDGE_DB setting
The server will automatically create required collections and indexes

AWS S3 Setup

Create an S3 bucket for document storage
Create an IAM user with permissions for this bucket
Use the access keys in your .env file

API Keys

OpenAI API key: Required if using OpenAI for embeddings
Unstructured API key: Required for document parsing

Next Steps

See the Quick Start Guide to begin using your server

PreviousDataBridge Docs NextQuick Start

Last updated 4 months ago