AI-powered semantic code search for your codebase. Find code by meaning, not just keywords.
Version 9.14.1 - Changelog | Migration Guide | Architecture
- Installation
- Quick Start
- Key Features
- Semantic Search
- Multimodal Search (NEW in v8.8)
- Full-Text Search
- SCIP Code Intelligence
- Git History Search
- Langfuse Trace Sync (NEW in v8.10)
- Inter-Repository Dependency Map (NEW in v9.0)
- Multi-Provider Embedding (NEW in v9.8)
- Operating Modes
- Common Commands
- Documentation
CIDX combines semantic embeddings with traditional search to help you find code by meaning, not just keywords. Search your codebase with natural language queries like "authentication logic" or "database connection setup", trace symbol references with SCIP code intelligence, and explore git history semantically.
pipx install git+https://github.com/LightspeedDMS/code-indexer.git@v8.6.0
# Verify installation
cidx --versionpython3 -m venv code-indexer-env
source code-indexer-env/bin/activate
pip install git+https://github.com/LightspeedDMS/code-indexer.git@v8.6.0Requirements: Python 3.9+, 4GB+ RAM, VoyageAI API key (or Cohere API key)
For detailed installation instructions including Windows, configuration, and troubleshooting, see Installation Guide.
# Navigate to your project
cd /path/to/your/project
# Set embedding provider API key (VoyageAI by default; Cohere also supported)
export VOYAGE_API_KEY="your-api-key" # or export CO_API_KEY="your-key" for Cohere
# Index your codebase
cidx index
# Search semantically
cidx query "authentication logic" --limit 5
# Search with filters
cidx query "user" --language python --min-score 0.7
cidx query "save" --path-filter "*/models/*" --limit 10For comprehensive query options and search strategies, see Query Guide.
Find code by meaning using AI embeddings powered by VoyageAI or Cohere. Ask natural language questions and get semantically relevant results ranked by similarity.
cidx query "authentication logic" --limit 10
cidx query "database connection setup" --language pythonSee: Query Guide
Search documentation that includes diagrams, screenshots, and visual content. CIDX automatically detects markdown, HTML, and HTMX files with embedded images and indexes them using multimodal embeddings, making visual content semantically searchable.
How it works:
- Automatic detection: During indexing, CIDX identifies files containing images (PNG, JPG, WebP, GIF):
- Markdown (
.md): Parsessyntax - HTML/HTMX (
.html,.htmx): Parses<img src="path">tags
- Markdown (
- Dual indexing: Text goes to
voyage-code-3, image+text content goes tovoyage-multimodal-3 - Parallel search: Queries search both indexes simultaneously, merging results by relevance
- Transparent experience: No special flags needed - multimodal search happens automatically when multimodal content exists
# Index your project (multimodal detection is automatic)
cidx index
# Query searches both code AND visual documentation
cidx query "database schema diagram"
cidx query "API authentication flow"Query output shows dual-index status:
Using: voyage-code-3, voyage-multimodal-3 # Both indexes active
Query Timing:
Parallel multi-index query 1.09s
voyage-code-3 index (parallel) 1.09s
voyage-multimodal-3 index (parallel) 470ms
Merge & deduplicate 0.06ms
Supported image formats: PNG, JPG/JPEG, WebP, GIF (embedded in markdown via  syntax)
See: Architecture Guide
Fast exact text matching with fuzzy search, regex support, and case sensitivity options. Up to 50x faster than grep with indexed searching.
cidx query "authenticate_user" --fts
cidx query "ParseError" --fts --case-sensitive
cidx query "test_.*" --fts --regex --language pythonSee: Query Guide
Precise code navigation using SCIP (Source Code Intelligence Protocol). Find symbol definitions, references, dependencies, dependents, call chains, and perform impact analysis.
cidx scip generate # Generate SCIP indexes
cidx scip definition "UserService" # Find definition
cidx scip references "authenticate" # Find all usages
cidx scip callchain "main" "login" # Trace execution path
cidx scip impact "DatabaseManager" # Impact analysisSee: SCIP Code Intelligence Guide
Search your entire commit history semantically. Find when code was added, modified, or deleted with time-range filtering, author filtering, and diff type selection.
cidx index --index-commits # Index git history (one-time)
cidx query "JWT auth" --time-range-all # Search all history
cidx query "bug fix" --time-range 2024-01-01..2024-12-31
cidx query "login" --time-range-all --author "john@example.com"Monitor file changes and automatically re-index in real-time with daemon mode. Get ~5ms cached queries versus ~1s from disk.
cidx config --daemon # Enable daemon mode
cidx start # Start daemon
cidx watch # Start watch mode
cidx query "search" # Fast cached queriesConnect AI assistants to CIDX for semantic search directly in conversations. Supports local CLI integration (Claude Code, Gemini, Codex) and remote MCP server integration (Claude Desktop).
# Local CLI integration
cidx teach-ai --claude --project # Creates CLAUDE.md
# Remote MCP server for Claude Desktop
# See MCP Bridge guide for setupSee: AI Integration Guide | MCP Bridge Guide
Automatically pull AI conversation traces from Langfuse and make them semantically searchable. CIDX syncs traces in the background, indexes them with the same semantic search engine used for code, and makes them available via MCP tools and CLI queries.
How it works:
- Background sync: Pulls traces from configured Langfuse projects at a configurable interval (default: 5 minutes)
- Smart deduplication: Overlap window + content hash strategy detects trace mutations without re-downloading unchanged data
- Auto-registration: New trace folders are automatically registered as golden repos and indexed
- Watch integration: File system watchers trigger incremental re-indexing as new traces arrive
Trace storage layout:
golden-repos/
langfuse_<project>_<userId>/
<sessionId>/
<traceId>.json # Full trace + observations, chronologically ordered
Each trace file contains the user prompt (trace.input), AI response (trace.output), metadata, and all observations (tool calls) in chronological order.
Search traces via MCP:
search_code("authentication error handling", repository_alias="langfuse_*")
search_code("SQL query generation", repository_alias="langfuse_MyProject_*")
Dashboard monitoring: Real-time sync health, per-project metrics (traces checked/new/updated), storage statistics, and manual sync trigger from the admin dashboard.
Configuration: Enable via the Web UI Config Screen under Langfuse settings. Requires Langfuse project public/secret key pair.
Pre-computed semantic dependency map that analyzes source code across all registered golden repos, identifies domain-level relationships, and produces queryable documents so MCP clients can immediately determine the relevant repo set for cross-repo tasks -- without performing exploratory searches.
How it works:
- Multi-pass analysis: Claude CLI pipeline examines source code across all repos in three passes: domain synthesis, per-domain deep dive (imports, API contracts, shared types), and index generation
- Domain-clustered output: Produces per-domain
.mdfiles and an_index.mdwith domain catalog and repo-to-domain matrix, all stored incidx-meta/dependency-map/ - Incremental delta refresh: Scheduled daemon detects changed/new/removed repos via commit hash comparison and updates only affected domain files
- MCP discovery: Quick reference automatically directs MCP clients to check the dependency map first for cross-repo tasks
- Human corrections: Power users can edit dependency map files directly via MCP file CRUD tools; changes are auto-reindexed
- Cross-domain dependency graph: Pass 3 automatically builds a directed graph of inter-domain connections by parsing each domain file, detecting which repos are mentioned across domain boundaries, and appending the graph to
_index.mdwith edge list and standalone domain identification
Dependency map output structure:
cidx-meta/
dependency-map/
_index.md # Domain catalog + repo-to-domain matrix + cross-domain dependency graph
authentication.md # Per-domain: repo roles, subdomain deps, cross-domain connections
data-pipeline.md # Each domain file has YAML frontmatter with participating repos
...
Usage via MCP:
# Quick reference tells MCP clients about the dependency map
cidx_quick_reference # Shows dependency map section with workflow
# Read the dependency map
get_file_content("cidx-meta-global", "dependency-map/_index.md")
get_file_content("cidx-meta-global", "dependency-map/authentication.md")
# Search dependency map semantically
search_code("authentication between repos", repository_alias="cidx-meta-global")
# Admin: trigger analysis on demand
trigger_dependency_analysis(mode="full") # Full regeneration
trigger_dependency_analysis(mode="delta") # Incremental update
# Power user: correct inaccuracies
edit_file(repository_alias="cidx-meta-global", path="dependency-map/authentication.md", ...)
Configuration (via Web UI Config Screen):
dependency_map_enabled: Enable the feature (default: off, opt-in)dependency_map_interval_hours: Refresh interval (default: 168 hours / weekly)dependency_map_pass_timeout_seconds: Per-pass Claude CLI timeout (default: 600s)
See: Meta-Repo Discovery Guide
CIDX supports multiple embedding providers for redundancy and flexibility:
- VoyageAI (default): voyage-3 (1024 dims) or voyage-3-large (1536 dims)
- Cohere (new): embed-v4.0 with 2048 dimensions, embedded tokenizer (no SDK required)
Query strategies control which provider serves results:
primary_only— use configured primary provider (default)failover— try primary, automatically fall back to secondary on failureparallel— query both providers, fuse results (RRF, multiply, or average)specific— explicitly target one provider with--provider
CLI usage:
cidx query "authentication" --strategy parallel --score-fusion rrf
cidx query "database setup" --strategy specific --provider cohere
cidx provider-health # check provider status
cidx provider-index list # manage per-provider indexesCIDX operates in three modes optimized for different use cases:
Direct command-line interface with two operational sub-modes:
Local Mode (default) - Direct file access with instant setup and no dependencies:
cidx init # Create .code-indexer/ locally
cidx index # Index codebase locally
cidx query # Search (~1s per query from disk)Remote Mode - Connect to CIDX server with repository linking:
# Initialize remote connection
cidx init --remote https://cidx.example.com --username user --password pass
# Query executes on remote server
cidx query "search term" # Transparent remote execution
# Sync repositories with server
cidx sync # Sync current repository
cidx sync my-project # Sync specific repository
cidx sync --all # Sync all repositories (multi-repo support)Remote Mode Features:
- Repository linking (automatic matching between local repo and server golden repo)
- Transparent remote query execution (same CLI, server-side processing)
- Multi-repo support (manage and sync multiple repositories)
- OAuth 2.0 authentication with server
- Access team's centralized indexed repositories
Background service with in-memory caching for faster queries (~5ms) and real-time watch mode.
cidx config --daemon # Enable daemon
cidx start # Start daemon
cidx query "search" # Fast cached queries
cidx watch # Real-time indexingMulti-user server with centralized golden repositories for team-wide semantic search. Deploy CIDX as an HTTP/HTTPS service with OAuth 2.0 authentication, REST API, MCP interface, and web UI administration.
cidx-server start # Start multi-user server
# Users query via REST API or MCP
# Admin manages repos via web UICore Capabilities:
- Multi-Repo Indexing: Centralized golden repositories shared across team
- Multi-User Access: OAuth 2.0/OIDC authentication with role-based permissions
- Advanced Caching: HNSW cache with <1ms warm queries (100-1800x speedup)
- REST API: Programmatic access with full query parameter support
- MCP Protocol: Claude Desktop integration for AI-assisted code search
- Web Administration: Manage users, repositories, and configuration via browser
- Repository Management: Add, refresh, remove golden repositories
- User Management: Create users, assign roles (admin/power_user/normal_user)
- Cache Monitoring: Real-time cache statistics and performance metrics
Run multiple CIDX Server nodes sharing a single PostgreSQL database for horizontal scaling and high availability. Cluster mode requires setting storage_mode to "postgres" in ~/.cidx-server/config.json. All application functionality (REST API, MCP, Web UI) is identical in standalone and cluster modes; the storage backends swap transparently via Protocol interfaces.
Key cluster features:
- Leader election via PostgreSQL advisory lock ensures exactly one node runs schedulers at a time
- Node heartbeat tracking with automatic failover detection (30-second threshold)
- Distributed background job queue with orphaned job recovery and re-execution
- Centralized runtime configuration in PostgreSQL with 30-second cross-node propagation
- All security services cluster-aware: rate limiters, session invalidation, token blacklist, OIDC state, MFA
- Activated repository metadata shared across nodes via PostgreSQL
- HNSW max_elements configurable via Web UI (default 1,000,000)
- Per-node metrics carousel in the admin dashboard
- SQLite-to-PostgreSQL data migration tool for converting existing installations
For architecture details see Cluster Architecture Guide. For setup and operations see Cluster Setup Guide.
Claude Delegation (v8.5+):
- Protected Repository Analysis: AI agents analyze code without exposing source to clients
- Delegation Functions: Pre-defined AI workflows for code review, analysis, and transformation
- Collaborative Mode (v9.7+): DAG-based orchestrated multi-step delegation with per-step engines and repos
- Competitive Mode (v9.7+): Decompose-compete-judge pipeline with multiple AI engines
- Acting Users (v9.7+): Scoped repository access via acting_users parameter for multi-tenant delegation
- Group-Based Access: Control which users can execute which delegation functions
- Callback-Based Completion: Efficient job polling with server-side callbacks and cross-node tracking
Security & Observability (v8.5+):
- Group-Based Security: Fine-grained access control using group membership
- OTEL Telemetry: OpenTelemetry integration for traces, metrics, and observability
- Auto-Discovery: Automatic repository discovery from GitHub organizations or local paths
- Auto-Update: Job-aware server updates with graceful drain mode
Self-Monitoring (v8.8.2+):
- Claude-Powered Log Analysis: Scheduled background scans analyze server logs using Claude CLI
- Automatic Issue Creation: Detected bugs automatically create GitHub issues with reproduction steps
- Actionable Focus: Filters configuration noise, reports only development-actionable bugs
- Manual Trigger: On-demand scans via admin API endpoint
- Debug Memory Snapshot (v9.5.7+): Localhost-only endpoints for diagnosing memory leaks without restarting the server.
GET /debug/memory-snapshotreturns object counts and sizes by type (top 100), with module-qualified names and self-monitoring overhead.GET /debug/memory-compare?baseline={timestamp}diffs against a prior snapshot. Secured by network restriction (127.0.0.1/::1 only, no auth required).
Authentication & Authorization:
- OAuth 2.0 and OIDC (OpenID Connect) support with SSO
- TOTP multi-factor authentication (MFA) with QR setup and recovery codes
- Password expiry enforcement for non-SSO accounts
- Login rate limiting with automatic account lockout
- Configurable admin session timeout
- Three role levels: admin (full access), power_user (activate repos), normal_user (query only)
- Group-based repository and function access control
- Secure token-based API access with cluster-wide JWT revocation
Performance:
- Cold query: ~277ms (first access, loads from disk)
- Warm query: <1ms (cached, 100-1800x faster)
- Configurable cache TTL (default 10 minutes)
- Per-repository cache isolation
For detailed setup, deployment, and configuration, see Operating Modes Guide.
cidx init # Create .code-indexer/ config
cidx index # Semantic indexing (default)
cidx index --fts # Add full-text search
cidx index --index-commits # Add git history indexing
cidx scip generate # Generate SCIP indexes# Semantic search
cidx query "search term" --limit 10
# Full-text search
cidx query "exact text" --fts
# Regex pattern matching
cidx query "pattern" --fts --regex
# Git history search
cidx query "term" --time-range-all --quiet
# SCIP code intelligence
cidx scip definition "Symbol"
cidx scip references "function_name"--language python # Filter by language
--path-filter "*/tests/*" # Filter by path pattern
--exclude-path "*/vendor/*" # Exclude paths
--min-score 0.8 # Minimum similarity score
--limit 20 # Max resultscidx config --daemon # Enable daemon
cidx start # Start daemon
cidx stop # Stop daemon
cidx status # Check status
cidx watch # Start watch mode
cidx watch-stop # Stop watch modeCIDX requires minimal configuration. The VoyageAI API key is the only required setting.
# Add to shell profile (~/.bashrc or ~/.zshrc)
export VOYAGE_API_KEY="your-api-key-here"
source ~/.bashrcCIDX auto-creates .code-indexer/config.json on first run with sensible defaults. You can customize:
file_extensions- File types to indexexclude_dirs- Directories to skipmax_file_size- Maximum file size (default 1MB)
For complete configuration reference including environment variables, daemon settings, and watch mode options, see Configuration Guide.
- Installation Guide - Complete installation for all platforms
- Query Guide - All 23 query parameters and search strategies
- Configuration Guide - VoyageAI setup, config options, environment variables
- SCIP Code Intelligence - Symbol navigation, dependencies, call chains
- Temporal Search - Git history search with time-range filtering
- Operating Modes - CLI, Daemon, Server modes explained
- AI Integration Guide - Connect AI assistants to CIDX
- MCP Bridge Guide - Claude Desktop integration via MCP
- Guardrails Repository Convention - Custom safety guardrails for open delegation jobs
- Auto-Update Guide - Job-aware auto-update with graceful drain mode
- Cluster Architecture Guide - Multi-node cluster design, storage abstraction, leader election, and services
- Cluster Setup Guide - Install and operate a CIDX Server cluster with PostgreSQL
- Architecture Guide - System design and storage architecture
- Migration Guide - Upgrading from v7.x to v8.x
- Changelog - Version history and release notes
Contributions welcome! We appreciate bug reports, feature suggestions, and code contributions.
Quick Setup:
# 1. Clone and install
git clone https://github.com/YOUR_USERNAME/code-indexer.git
cd code-indexer
# 2. Initialize submodule (required for custom hnswlib build)
git submodule update --init --recursive
# 3. Install in editable mode
pip install -e ".[dev]"
# 4. Install pre-commit hooks (REQUIRED)
pre-commit install
# 5. Run tests
./fast-automation.shPre-commit Hooks: All commits are automatically checked for linting, formatting, and type errors. Hooks auto-fix most issues.
See CONTRIBUTING.md for complete development setup, testing guidelines, and code quality standards.
- Bugs: GitHub Issues
- Features: GitHub Issues
- Questions: GitHub Discussions
MIT License - See repository for full license text.
Support: GitHub Issues Repository: https://github.com/LightspeedDMS/code-indexer