Skip to content
View ashvardanian's full-sized avatar
โ˜•
Less Slow
โ˜•
Less Slow

Block or report ashvardanian

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
ashvardanian/README.md

Hey, I'm Ash โ€” I Love Building Infrastructure

  • Building Unum Cloud since 2015.
  • Computer Science & AI researcher (unpublished, by choice).
  • Twice an Astrophysics dropout, lifelong Bioinformatics fan.
  • Investing in deep-tech, cloud, & semiconductors.
  • Fluent in English, Russian & Armenian.
  • Lived in ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ฌ๐Ÿ‡ง๐Ÿ‡ท๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ฒ & ๐Ÿ‡ฒ๐Ÿ‡ฝ๐Ÿ‡ต๐Ÿ‡ฆ๐Ÿ‡ฆ๐Ÿ‡ท๐Ÿ‡ฉ๐Ÿ‡ช๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡น๐Ÿ‡ญ๐Ÿ‡ฒ๐Ÿ‡พ๐Ÿ‡ป๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡ฉ.
  • Frequent host of "Systems" meetups in Armenia, and beyond.

For ~20 years, I've been coding in C++, CUDA, and Python โ€” optimizing Assembly on x86 & ARM. Prefer spaces over tabs, and use east-const and procedural code over OOP or functional abstractions.

Want to chat? I'm @ashvardanian on GitHub, LinkedIn, Twitter, Facebook, and YouTube. For venture, reach me at ash@aal.vc ๐Ÿค—

GitHub Org's stars: Unum-Cloud GitHub User's stars: AshVardanian HackerNews User Karma
USearch Python installs SimSIMD Python installs StringZilla Python installs

Repositories

  • USearch - one of the world's most popular search engines, used in databases, AI labs, and large-scale Natural Science experiments. Compact C++ core with 10+ language bindings โ€” 10โ€“100ร— faster than Meta FAISS for vector search and far beyond Apache Lucene.
  • StringZilla - one of the fastest string/text processing libraries leveraging SIMD, SWAR, and CUDA-accelerated algorithms for search, matching, hashing, and sorting at Web-scale Unicode UTF-8 text and Petabyte-scale Bioinformatics data. Hundreds of hand-tuned kernels with manual multi-versioning, exposed to C, C++, Rust, Python, Swift, and JavaScript, up to 10ร— faster on CPUs and 100ร— faster on GPUs.
  • NumKong - one of the largest & most-portable mixed-precision numerics projects for C, C++23, Python, Rust, and JavaScript. Designed for linear algebra, scientific computing, statistics, information retrieval, and image processing, delivering consistent SIMD speedups over BLAS and NumPy on x86, ARM, RISC-V, PowerPC, LoongArch, and in-browser WASM environments, bringing 6-, 8-, and 16-bit obscure GPU-only floats to every modern chip.
  • ForkUnion - ultra-low-latency parallelism library for Rust and C++. Avoids allocations, mutexes, and even Compare-And-Swap atomics โ€” achieving up to 10ร— speedups over Rayon and TaskFlow.

Some of those are used in open-source databases, like ClickHouse, DuckDB, TiDB, ScyllaDB, yugabyteDB, DragonflyDB, MemGraph, Vald, Turso, Chroma, LLM toolchains, like LangChain, LlamaIndex, Microsoft SemanticKernel, Nomic AI GPT4All, Surf, and many other less "open" systems, such as backend infrastructure of major AI labs, government intelligence agencies, hyperscale cloud companies, Fortune 500, iOS and Android apps with 100M-1B MAU.

And more projects, benchmarks, tutorials, and fun hackathon experiments:

  • UCall - a kernel-bypass web server backend for C and Python built on io_uring. Achieves 70ร— higher throughput and 50ร— lower latency than FastAPI for real-time workloads, including serving compact AI models.
  • UForm - tiny multimodal AI models with state-of-the-art parameter and data efficiency. Compatible with Python, JS, and Swift, serving as a lightweight alternative to OpenAI CLIP for on-device and server inference.
  • less_slow.cpp - teaches a performance-oriented mindset for C++, CUDA, PTX, and ASM
    • less_slow.rs - Rust adaptation with a focus on higher-level abstractions
    • less_slow.py - Python adaptation with a focus on scripting & data-management
  • SpaceV - 1 billion vectors from Microsoft SpaceV extended for usability
  • USearchMolecules - 28 billion fingerprints for drug discovery, published with AWS
  • SwiftSemanticSearch - example of on-device real-time AI using UForm and USearch on iOS
  • RetriEval - Billion-scale Vector Search benchmarks for USearch, FAISS, cuVS, Weaviate, Qdrant, etc.
  • NumWars - micro-benchmarking NumKong against the best Rust & Python BLAS libraries
  • StringWars - micro-benchmarking StringZilla against the best Rust & Python string libraries
  • HashEvals - testing avalanche effect & differential patterns of string hash functions
  • ParallelReductionsBenchmark - GPGPU benchmarks for SyCL, CUDA, OpenCL, Vulkan, etc.
  • AffineGaps - "less wrong" local and global Gotoh sequence alignments in one NumBa Python file
  • FasterFASTA - CLI tool to parse, sort, dedup, and translate DNA, RNA, & protein sequences
  • StringTape - Apache Arrow compatible tapes for space-efficient string arrays
  • LibSee - non-intrusively profiling LibC calls with LD_PRELOAD tricks
  • ScalingElections - parallel combinatorial voting in CUDA and Mojo for H100 GPUs
  • UStore - multimodal embedded database for C, C++, and Python designed around key-value stores
  • TinySemVer - semantic versioning GitHub CI tool that doesn't take 300K lines of JavaScript

Materials

Cherry picks:

Pinned Loading

  1. unum-cloud/USearch unum-cloud/USearch Public

    Fast Open-Source Search & Clustering engine ร— for Vectors & Arbitrary Objects ร— in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram ๐Ÿ”

    C++ 4k 308

  2. StringZilla StringZilla Public

    Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and mโ€ฆ

    C 3.4k 122

  3. unum-cloud/UCall unum-cloud/UCall Public

    Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring โ˜Ž๏ธ

    C 1.3k 54

  4. NumKong NumKong Public

    SIMD-accelerated distances, dot products, matrix ops, geospatial & geometric kernels for 16 numeric types โ€” from 6-bit floats to 64-bit complex โ€” across x86, Arm, RISC-V, and WASM, with bindings foโ€ฆ

    C 1.8k 117

  5. unum-cloud/UForm unum-cloud/UForm Public

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and ๐Ÿ”œ video, up to 5x faster than OpenAI CLIP and LLaVA ๐Ÿ–ผ๏ธ & ๐Ÿ–‹๏ธ

    Python 1.2k 79

  6. less_slow.cpp less_slow.cpp Public

    Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

    C++ 1.9k 81