Ash Vardanian ashvardanian

Hey, I'm Ash — I Love Building Infrastructure

Building Unum Cloud since 2015.
Computer Science & AI researcher (unpublished, by choice).
Twice an Astrophysics dropout, lifelong Bioinformatics fan.
Investing in deep-tech, cloud, & semiconductors.
Fluent in English, Russian & Armenian.
Lived in 🇺🇸🇬🇧🇷🇺🇦🇲 & 🇲🇽🇵🇦🇦🇷🇩🇪🇦🇪🇹🇭🇲🇾🇻🇳🇮🇩.
Frequent host of "Systems" meetups in Armenia, and beyond.

For ~20 years, I've been coding in C++, CUDA, and Python — optimizing Assembly on x86 & ARM. Prefer spaces over tabs, and use east-const and procedural code over OOP or functional abstractions.

Want to chat? I'm @ashvardanian on GitHub, LinkedIn, Twitter, Facebook, and YouTube. For venture, reach me at ash@aal.vc 🤗

Repositories

USearch - one of the world's most popular search engines, used in databases, AI labs, and large-scale Natural Science experiments. Compact C++ core with 10+ language bindings — 10–100× faster than Meta FAISS for vector search and far beyond Apache Lucene.
StringZilla - one of the fastest string/text processing libraries leveraging SIMD, SWAR, and CUDA-accelerated algorithms for search, matching, hashing, and sorting at Web-scale Unicode UTF-8 text and Petabyte-scale Bioinformatics data. Hundreds of hand-tuned kernels with manual multi-versioning, exposed to C, C++, Rust, Python, Swift, and JavaScript, up to 10× faster on CPUs and 100× faster on GPUs.
NumKong - one of the largest & most-portable mixed-precision numerics projects for C, C++23, Python, Rust, and JavaScript. Designed for linear algebra, scientific computing, statistics, information retrieval, and image processing, delivering consistent SIMD speedups over BLAS and NumPy on x86, ARM, RISC-V, PowerPC, LoongArch, and in-browser WASM environments, bringing 6-, 8-, and 16-bit obscure GPU-only floats to every modern chip.
ForkUnion - ultra-low-latency parallelism library for Rust and C++. Avoids allocations, mutexes, and even Compare-And-Swap atomics — achieving up to 10× speedups over Rayon and TaskFlow.

Some of those are used in open-source databases, like ClickHouse, DuckDB, TiDB, ScyllaDB, yugabyteDB, DragonflyDB, MemGraph, Vald, Turso, Chroma, LLM toolchains, like LangChain, LlamaIndex, Microsoft SemanticKernel, Nomic AI GPT4All, Surf, and many other less "open" systems, such as backend infrastructure of major AI labs, government intelligence agencies, hyperscale cloud companies, Fortune 500, iOS and Android apps with 100M-1B MAU.

And more projects, benchmarks, tutorials, and fun hackathon experiments:

UCall - a kernel-bypass web server backend for C and Python built on io_uring. Achieves 70× higher throughput and 50× lower latency than FastAPI for real-time workloads, including serving compact AI models.
UForm - tiny multimodal AI models with state-of-the-art parameter and data efficiency. Compatible with Python, JS, and Swift, serving as a lightweight alternative to OpenAI CLIP for on-device and server inference.
less_slow.cpp - teaches a performance-oriented mindset for C++, CUDA, PTX, and ASM
- less_slow.rs - Rust adaptation with a focus on higher-level abstractions
- less_slow.py - Python adaptation with a focus on scripting & data-management
SpaceV - 1 billion vectors from Microsoft SpaceV extended for usability
USearchMolecules - 28 billion fingerprints for drug discovery, published with AWS
SwiftSemanticSearch - example of on-device real-time AI using UForm and USearch on iOS
RetriEval - Billion-scale Vector Search benchmarks for USearch, FAISS, cuVS, Weaviate, Qdrant, etc.
NumWars - micro-benchmarking NumKong against the best Rust & Python BLAS libraries
StringWars - micro-benchmarking StringZilla against the best Rust & Python string libraries
HashEvals - testing avalanche effect & differential patterns of string hash functions
ParallelReductionsBenchmark - GPGPU benchmarks for SyCL, CUDA, OpenCL, Vulkan, etc.
AffineGaps - "less wrong" local and global Gotoh sequence alignments in one NumBa Python file
FasterFASTA - CLI tool to parse, sort, dedup, and translate DNA, RNA, & protein sequences
StringTape - Apache Arrow compatible tapes for space-efficient string arrays
LibSee - non-intrusively profiling LibC calls with LD_PRELOAD tricks
ScalingElections - parallel combinatorial voting in CUDA and Mojo for H100 GPUs
UStore - multimodal embedded database for C, C++, and Python designed around key-value stores
TinySemVer - semantic versioning GitHub CI tool that doesn't take 300K lines of JavaScript

Materials

Cherry picks:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ash Vardanian ashvardanian

Achievements

Achievements

Block or report ashvardanian

Hey, I'm Ash — I Love Building Infrastructure

Repositories

Materials

Pinned Loading

Uh oh!