Code Indexer (`cidx`)

AI-powered semantic code search for your codebase. Find code by meaning, not just keywords.

Version 9.14.1 - Changelog | Migration Guide | Architecture

Quick Navigation

Installation
Quick Start
Key Features
- Semantic Search
- Multimodal Search (NEW in v8.8)
- Full-Text Search
- SCIP Code Intelligence
- Git History Search
- Langfuse Trace Sync (NEW in v8.10)
- Inter-Repository Dependency Map (NEW in v9.0)
- Multi-Provider Embedding (NEW in v9.8)
Operating Modes
Common Commands
Documentation

What is CIDX?

CIDX combines semantic embeddings with traditional search to help you find code by meaning, not just keywords. Search your codebase with natural language queries like "authentication logic" or "database connection setup", trace symbol references with SCIP code intelligence, and explore git history semantically.

Installation

pipx (Recommended)

pipx install git+https://github.com/LightspeedDMS/code-indexer.git@v8.6.0

# Verify installation
cidx --version

pip with virtual environment

python3 -m venv code-indexer-env
source code-indexer-env/bin/activate
pip install git+https://github.com/LightspeedDMS/code-indexer.git@v8.6.0

Requirements: Python 3.9+, 4GB+ RAM, VoyageAI API key (or Cohere API key)

For detailed installation instructions including Windows, configuration, and troubleshooting, see Installation Guide.

Quick Start

# Navigate to your project
cd /path/to/your/project

# Set embedding provider API key (VoyageAI by default; Cohere also supported)
export VOYAGE_API_KEY="your-api-key"  # or export CO_API_KEY="your-key" for Cohere

# Index your codebase
cidx index

# Search semantically
cidx query "authentication logic" --limit 5

# Search with filters
cidx query "user" --language python --min-score 0.7
cidx query "save" --path-filter "*/models/*" --limit 10

For comprehensive query options and search strategies, see Query Guide.

Key Features

Semantic Search

Find code by meaning using AI embeddings powered by VoyageAI or Cohere. Ask natural language questions and get semantically relevant results ranked by similarity.

cidx query "authentication logic" --limit 10
cidx query "database connection setup" --language python

See: Query Guide

Multimodal Search (v8.8+)

Search documentation that includes diagrams, screenshots, and visual content. CIDX automatically detects markdown, HTML, and HTMX files with embedded images and indexes them using multimodal embeddings, making visual content semantically searchable.

How it works:

Automatic detection: During indexing, CIDX identifies files containing images (PNG, JPG, WebP, GIF):
- Markdown (.md): Parses ![alt](path) syntax
- HTML/HTMX (.html, .htmx): Parses <img src="path"> tags
Dual indexing: Text goes to voyage-code-3, image+text content goes to voyage-multimodal-3
Parallel search: Queries search both indexes simultaneously, merging results by relevance
Transparent experience: No special flags needed - multimodal search happens automatically when multimodal content exists

# Index your project (multimodal detection is automatic)
cidx index

# Query searches both code AND visual documentation
cidx query "database schema diagram"
cidx query "API authentication flow"

Query output shows dual-index status:

Using: voyage-code-3, voyage-multimodal-3    # Both indexes active

Query Timing:
  Parallel multi-index query          1.09s
      voyage-code-3 index (parallel)      1.09s
      voyage-multimodal-3 index (parallel)      470ms
      Merge & deduplicate         0.06ms

Supported image formats: PNG, JPG/JPEG, WebP, GIF (embedded in markdown via ![alt](path) syntax)

See: Architecture Guide

Full-Text Search (FTS)

Fast exact text matching with fuzzy search, regex support, and case sensitivity options. Up to 50x faster than grep with indexed searching.

cidx query "authenticate_user" --fts
cidx query "ParseError" --fts --case-sensitive
cidx query "test_.*" --fts --regex --language python

See: Query Guide

SCIP Code Intelligence

Precise code navigation using SCIP (Source Code Intelligence Protocol). Find symbol definitions, references, dependencies, dependents, call chains, and perform impact analysis.

cidx scip generate                    # Generate SCIP indexes
cidx scip definition "UserService"    # Find definition
cidx scip references "authenticate"   # Find all usages
cidx scip callchain "main" "login"    # Trace execution path
cidx scip impact "DatabaseManager"    # Impact analysis

See: SCIP Code Intelligence Guide

Git History Search (Temporal)

Search your entire commit history semantically. Find when code was added, modified, or deleted with time-range filtering, author filtering, and diff type selection.

cidx index --index-commits                # Index git history (one-time)
cidx query "JWT auth" --time-range-all    # Search all history
cidx query "bug fix" --time-range 2024-01-01..2024-12-31
cidx query "login" --time-range-all --author "john@example.com"

See: Temporal Search Guide

Real-Time Watch Mode

Monitor file changes and automatically re-index in real-time with daemon mode. Get ~5ms cached queries versus ~1s from disk.

cidx config --daemon    # Enable daemon mode
cidx start              # Start daemon
cidx watch              # Start watch mode
cidx query "search"     # Fast cached queries

See: Operating Modes Guide

AI Integration

Connect AI assistants to CIDX for semantic search directly in conversations. Supports local CLI integration (Claude Code, Gemini, Codex) and remote MCP server integration (Claude Desktop).

# Local CLI integration
cidx teach-ai --claude --project    # Creates CLAUDE.md

# Remote MCP server for Claude Desktop
# See MCP Bridge guide for setup

See: AI Integration Guide | MCP Bridge Guide

Langfuse Trace Sync (v8.10+)

Automatically pull AI conversation traces from Langfuse and make them semantically searchable. CIDX syncs traces in the background, indexes them with the same semantic search engine used for code, and makes them available via MCP tools and CLI queries.

How it works:

Background sync: Pulls traces from configured Langfuse projects at a configurable interval (default: 5 minutes)
Smart deduplication: Overlap window + content hash strategy detects trace mutations without re-downloading unchanged data
Auto-registration: New trace folders are automatically registered as golden repos and indexed
Watch integration: File system watchers trigger incremental re-indexing as new traces arrive

Trace storage layout:

golden-repos/
  langfuse_<project>_<userId>/
    <sessionId>/
      <traceId>.json    # Full trace + observations, chronologically ordered

Each trace file contains the user prompt (trace.input), AI response (trace.output), metadata, and all observations (tool calls) in chronological order.

Search traces via MCP:

search_code("authentication error handling", repository_alias="langfuse_*")
search_code("SQL query generation", repository_alias="langfuse_MyProject_*")

Dashboard monitoring: Real-time sync health, per-project metrics (traces checked/new/updated), storage statistics, and manual sync trigger from the admin dashboard.

Configuration: Enable via the Web UI Config Screen under Langfuse settings. Requires Langfuse project public/secret key pair.

Inter-Repository Dependency Map (v9.0+)

Pre-computed semantic dependency map that analyzes source code across all registered golden repos, identifies domain-level relationships, and produces queryable documents so MCP clients can immediately determine the relevant repo set for cross-repo tasks -- without performing exploratory searches.

How it works:

Multi-pass analysis: Claude CLI pipeline examines source code across all repos in three passes: domain synthesis, per-domain deep dive (imports, API contracts, shared types), and index generation
Domain-clustered output: Produces per-domain .md files and an _index.md with domain catalog and repo-to-domain matrix, all stored in cidx-meta/dependency-map/
Incremental delta refresh: Scheduled daemon detects changed/new/removed repos via commit hash comparison and updates only affected domain files
MCP discovery: Quick reference automatically directs MCP clients to check the dependency map first for cross-repo tasks
Human corrections: Power users can edit dependency map files directly via MCP file CRUD tools; changes are auto-reindexed
Cross-domain dependency graph: Pass 3 automatically builds a directed graph of inter-domain connections by parsing each domain file, detecting which repos are mentioned across domain boundaries, and appending the graph to _index.md with edge list and standalone domain identification

Dependency map output structure:

cidx-meta/
  dependency-map/
    _index.md                 # Domain catalog + repo-to-domain matrix + cross-domain dependency graph
    authentication.md         # Per-domain: repo roles, subdomain deps, cross-domain connections
    data-pipeline.md          # Each domain file has YAML frontmatter with participating repos
    ...

Usage via MCP:

# Quick reference tells MCP clients about the dependency map
cidx_quick_reference         # Shows dependency map section with workflow

# Read the dependency map
get_file_content("cidx-meta-global", "dependency-map/_index.md")
get_file_content("cidx-meta-global", "dependency-map/authentication.md")

# Search dependency map semantically
search_code("authentication between repos", repository_alias="cidx-meta-global")

# Admin: trigger analysis on demand
trigger_dependency_analysis(mode="full")    # Full regeneration
trigger_dependency_analysis(mode="delta")   # Incremental update

# Power user: correct inaccuracies
edit_file(repository_alias="cidx-meta-global", path="dependency-map/authentication.md", ...)

Configuration (via Web UI Config Screen):

dependency_map_enabled: Enable the feature (default: off, opt-in)
dependency_map_interval_hours: Refresh interval (default: 168 hours / weekly)
dependency_map_pass_timeout_seconds: Per-pass Claude CLI timeout (default: 600s)

See: Meta-Repo Discovery Guide

Multi-Provider Embedding (v9.8)

CIDX supports multiple embedding providers for redundancy and flexibility:

VoyageAI (default): voyage-3 (1024 dims) or voyage-3-large (1536 dims)
Cohere (new): embed-v4.0 with 2048 dimensions, embedded tokenizer (no SDK required)

Query strategies control which provider serves results:

primary_only — use configured primary provider (default)
failover — try primary, automatically fall back to secondary on failure
parallel — query both providers, fuse results (RRF, multiply, or average)
specific — explicitly target one provider with --provider

CLI usage:

cidx query "authentication" --strategy parallel --score-fusion rrf
cidx query "database setup" --strategy specific --provider cohere
cidx provider-health  # check provider status
cidx provider-index list  # manage per-provider indexes

Operating Modes

CIDX operates in three modes optimized for different use cases:

CLI Mode (Individual Developers & Remote Access)

Direct command-line interface with two operational sub-modes:

Local Mode (default) - Direct file access with instant setup and no dependencies:

cidx init      # Create .code-indexer/ locally
cidx index     # Index codebase locally
cidx query     # Search (~1s per query from disk)

Remote Mode - Connect to CIDX server with repository linking:

# Initialize remote connection
cidx init --remote https://cidx.example.com --username user --password pass

# Query executes on remote server
cidx query "search term"  # Transparent remote execution

# Sync repositories with server
cidx sync                 # Sync current repository
cidx sync my-project     # Sync specific repository
cidx sync --all          # Sync all repositories (multi-repo support)

Remote Mode Features:

Repository linking (automatic matching between local repo and server golden repo)
Transparent remote query execution (same CLI, server-side processing)
Multi-repo support (manage and sync multiple repositories)
OAuth 2.0 authentication with server
Access team's centralized indexed repositories

Daemon Mode (Performance)

Background service with in-memory caching for faster queries (~5ms) and real-time watch mode.

cidx config --daemon    # Enable daemon
cidx start              # Start daemon
cidx query "search"     # Fast cached queries
cidx watch              # Real-time indexing

Server Mode (Team Collaboration)

Multi-user server with centralized golden repositories for team-wide semantic search. Deploy CIDX as an HTTP/HTTPS service with OAuth 2.0 authentication, REST API, MCP interface, and web UI administration.

cidx-server start       # Start multi-user server
# Users query via REST API or MCP
# Admin manages repos via web UI

Core Capabilities:

Multi-Repo Indexing: Centralized golden repositories shared across team
Multi-User Access: OAuth 2.0/OIDC authentication with role-based permissions
Advanced Caching: HNSW cache with <1ms warm queries (100-1800x speedup)
REST API: Programmatic access with full query parameter support
MCP Protocol: Claude Desktop integration for AI-assisted code search
Web Administration: Manage users, repositories, and configuration via browser
Repository Management: Add, refresh, remove golden repositories
User Management: Create users, assign roles (admin/power_user/normal_user)
Cache Monitoring: Real-time cache statistics and performance metrics

Cluster Mode (Multi-Node Server)

Run multiple CIDX Server nodes sharing a single PostgreSQL database for horizontal scaling and high availability. Cluster mode requires setting storage_mode to "postgres" in ~/.cidx-server/config.json. All application functionality (REST API, MCP, Web UI) is identical in standalone and cluster modes; the storage backends swap transparently via Protocol interfaces.

Key cluster features:

Leader election via PostgreSQL advisory lock ensures exactly one node runs schedulers at a time
Node heartbeat tracking with automatic failover detection (30-second threshold)
Distributed background job queue with orphaned job recovery and re-execution
Centralized runtime configuration in PostgreSQL with 30-second cross-node propagation
All security services cluster-aware: rate limiters, session invalidation, token blacklist, OIDC state, MFA
Activated repository metadata shared across nodes via PostgreSQL
HNSW max_elements configurable via Web UI (default 1,000,000)
Per-node metrics carousel in the admin dashboard
SQLite-to-PostgreSQL data migration tool for converting existing installations

For architecture details see Cluster Architecture Guide. For setup and operations see Cluster Setup Guide.

Claude Delegation (v8.5+):

Protected Repository Analysis: AI agents analyze code without exposing source to clients
Delegation Functions: Pre-defined AI workflows for code review, analysis, and transformation
Collaborative Mode (v9.7+): DAG-based orchestrated multi-step delegation with per-step engines and repos
Competitive Mode (v9.7+): Decompose-compete-judge pipeline with multiple AI engines
Acting Users (v9.7+): Scoped repository access via acting_users parameter for multi-tenant delegation
Group-Based Access: Control which users can execute which delegation functions
Callback-Based Completion: Efficient job polling with server-side callbacks and cross-node tracking

Security & Observability (v8.5+):

Group-Based Security: Fine-grained access control using group membership
OTEL Telemetry: OpenTelemetry integration for traces, metrics, and observability
Auto-Discovery: Automatic repository discovery from GitHub organizations or local paths
Auto-Update: Job-aware server updates with graceful drain mode

Self-Monitoring (v8.8.2+):

Claude-Powered Log Analysis: Scheduled background scans analyze server logs using Claude CLI
Automatic Issue Creation: Detected bugs automatically create GitHub issues with reproduction steps
Actionable Focus: Filters configuration noise, reports only development-actionable bugs
Manual Trigger: On-demand scans via admin API endpoint
Debug Memory Snapshot (v9.5.7+): Localhost-only endpoints for diagnosing memory leaks without restarting the server. GET /debug/memory-snapshot returns object counts and sizes by type (top 100), with module-qualified names and self-monitoring overhead. GET /debug/memory-compare?baseline={timestamp} diffs against a prior snapshot. Secured by network restriction (127.0.0.1/::1 only, no auth required).

Authentication & Authorization:

OAuth 2.0 and OIDC (OpenID Connect) support with SSO
TOTP multi-factor authentication (MFA) with QR setup and recovery codes
Password expiry enforcement for non-SSO accounts
Login rate limiting with automatic account lockout
Configurable admin session timeout
Three role levels: admin (full access), power_user (activate repos), normal_user (query only)
Group-based repository and function access control
Secure token-based API access with cluster-wide JWT revocation

Performance:

Cold query: ~277ms (first access, loads from disk)
Warm query: <1ms (cached, 100-1800x faster)
Configurable cache TTL (default 10 minutes)
Per-repository cache isolation

For detailed setup, deployment, and configuration, see Operating Modes Guide.

Common Commands

Indexing

cidx init                    # Create .code-indexer/ config
cidx index                   # Semantic indexing (default)
cidx index --fts             # Add full-text search
cidx index --index-commits   # Add git history indexing
cidx scip generate           # Generate SCIP indexes

Querying

# Semantic search
cidx query "search term" --limit 10

# Full-text search
cidx query "exact text" --fts

# Regex pattern matching
cidx query "pattern" --fts --regex

# Git history search
cidx query "term" --time-range-all --quiet

# SCIP code intelligence
cidx scip definition "Symbol"
cidx scip references "function_name"

Filtering

--language python           # Filter by language
--path-filter "*/tests/*"   # Filter by path pattern
--exclude-path "*/vendor/*" # Exclude paths
--min-score 0.8             # Minimum similarity score
--limit 20                  # Max results

Daemon Mode

cidx config --daemon        # Enable daemon
cidx start                  # Start daemon
cidx stop                   # Stop daemon
cidx status                 # Check status
cidx watch                  # Start watch mode
cidx watch-stop             # Stop watch mode

Configuration

CIDX requires minimal configuration. The VoyageAI API key is the only required setting.

VoyageAI API Key (Required)

# Add to shell profile (~/.bashrc or ~/.zshrc)
export VOYAGE_API_KEY="your-api-key-here"
source ~/.bashrc

Project Configuration

CIDX auto-creates .code-indexer/config.json on first run with sensible defaults. You can customize:

file_extensions - File types to index
exclude_dirs - Directories to skip
max_file_size - Maximum file size (default 1MB)

For complete configuration reference including environment variables, daemon settings, and watch mode options, see Configuration Guide.

Documentation

Getting Started

Installation Guide - Complete installation for all platforms
Query Guide - All 23 query parameters and search strategies
Configuration Guide - VoyageAI setup, config options, environment variables

Features

SCIP Code Intelligence - Symbol navigation, dependencies, call chains
Temporal Search - Git history search with time-range filtering
Operating Modes - CLI, Daemon, Server modes explained

AI Integration

AI Integration Guide - Connect AI assistants to CIDX
MCP Bridge Guide - Claude Desktop integration via MCP
Guardrails Repository Convention - Custom safety guardrails for open delegation jobs

Server Administration

Auto-Update Guide - Job-aware auto-update with graceful drain mode
Cluster Architecture Guide - Multi-node cluster design, storage abstraction, leader election, and services
Cluster Setup Guide - Install and operate a CIDX Server cluster with PostgreSQL

Advanced

Architecture Guide - System design and storage architecture
Migration Guide - Upgrading from v7.x to v8.x
Changelog - Version history and release notes

Contributing

Contributions welcome! We appreciate bug reports, feature suggestions, and code contributions.

For Developers

Quick Setup:

# 1. Clone and install
git clone https://github.com/YOUR_USERNAME/code-indexer.git
cd code-indexer

# 2. Initialize submodule (required for custom hnswlib build)
git submodule update --init --recursive

# 3. Install in editable mode
pip install -e ".[dev]"

# 4. Install pre-commit hooks (REQUIRED)
pre-commit install

# 5. Run tests
./fast-automation.sh

Pre-commit Hooks: All commits are automatically checked for linting, formatting, and type errors. Hooks auto-fix most issues.

See CONTRIBUTING.md for complete development setup, testing guidelines, and code quality standards.

Reporting Issues

Bugs: GitHub Issues
Features: GitHub Issues
Questions: GitHub Discussions

License

MIT License - See repository for full license text.

Support: GitHub Issues Repository: https://github.com/LightspeedDMS/code-indexer

Name		Name	Last commit message	Last commit date
Latest commit History 1,840 Commits
.claude-memory		.claude-memory
.github		.github
deployment		deployment
dev/tools		dev/tools
docs		docs
prompts/ai_instructions		prompts/ai_instructions
scripts		scripts
server/web/templates/partials		server/web/templates/partials
src/code_indexer		src/code_indexer
test-fixtures		test-fixtures
tests		tests
third_party		third_party
tools		tools
.code-indexer-override.yaml		.code-indexer-override.yaml
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CLAUDE.md.old		CLAUDE.md.old
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
fast-automation.sh		fast-automation.sh
full-automation.sh		full-automation.sh
install-mcpb.sh		install-mcpb.sh
lint-strict.sh		lint-strict.sh
lint.sh		lint.sh
pyinstaller.spec		pyinstaller.spec
pyproject.toml		pyproject.toml
server-fast-automation.sh		server-fast-automation.sh
setup-test-environment.sh		setup-test-environment.sh
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Code Indexer (cidx)

Quick Navigation

What is CIDX?

Installation

pipx (Recommended)

pip with virtual environment

Quick Start

Key Features

Semantic Search

Multimodal Search (v8.8+)

Full-Text Search (FTS)

SCIP Code Intelligence

Git History Search (Temporal)

Real-Time Watch Mode

AI Integration

Langfuse Trace Sync (v8.10+)

Inter-Repository Dependency Map (v9.0+)

Multi-Provider Embedding (v9.8)

Operating Modes

CLI Mode (Individual Developers & Remote Access)

Daemon Mode (Performance)

Server Mode (Team Collaboration)

Cluster Mode (Multi-Node Server)

Common Commands

Indexing

Querying

Filtering

Daemon Mode

Configuration

VoyageAI API Key (Required)

Project Configuration

Documentation

Getting Started

Features

AI Integration

Server Administration

Advanced

Contributing

For Developers

Reporting Issues

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 293

Packages 0

Uh oh!

Uh oh!

Contributors 0

Languages

Code Indexer (`cidx`)

Packages

Contributors