Aatish Rana aatishrana495

Aatish Rana

Senior AI/ML Systems Engineer · on-device GenAI, ML compilers, and NPU deployment

Qualcomm · IIT Madras (MTech, Industrial AI) · NIT Rourkela (BTech, CSE)

About

ML systems engineer working at the intersection of deep learning compilers, runtime systems, and hardware-aware optimization for edge ML accelerators. My day-to-day sits closer to the runtime than to the notebook — where model graphs, execution providers, and silicon meet.

Background blends production deployment engineering on Windows-on-Snapdragon with graduate-level study in Industrial AI at IIT Madras.

What I work on

On-device and edge AI deployment for transformer and diffusion workloads
Execution providers and runtime behavior across ONNX Runtime, QNN EP, DirectML, and WinML
Graph-level optimization, fusion, layout, and dtype legality on ONNX graphs
Kernel-level optimization along Conv / GEMM / attention / activation paths
Quantization and calibration (INT8 / INT4) for transformer-class models
Olive + WinML enablement pathways for on-device model delivery

Selected work

Contributed to enabling Stable Diffusion v1.5 on Windows-on-Snapdragon — the publicly-announced Qualcomm + Microsoft collaboration bringing on-device generative AI to the NPU.
Work on on-device GenAI and NPU-facing model enablement across Windows platforms.
Execution-provider and runtime integration across the ONNX Runtime / QNN EP / DirectML / WinML ecosystem.
Graph-level optimization, kernel tuning, and quantization practice applied to transformer and diffusion workloads.
Young Technocrat Award — external recognition.

Systems stack

Compilers & runtimes — ONNX Runtime, QNN EP, DirectML, WinML, Olive, ONNX, IR-level graph transformations
Hardware targets — Qualcomm NPUs (Hexagon / HTP), Snapdragon X, ARM64, x64
Models & frameworks — PyTorch, Hugging Face Transformers, diffusion models, quantization toolchains (INT8 / INT4)
Languages — C++, Python, C
Perf & debugging — Profiling, tracing, kernel-level analysis, hardware-in-the-loop benchmarking
Platforms — Windows-on-Snapdragon, Linux

Current focus

Deployment-time optimization for transformer and diffusion workloads on edge accelerators
Compile-time vs. runtime tradeoffs across ONNX Runtime execution providers
Mixed-precision (INT4 / INT8 / FP16) scheduling for transformer blocks
Reproducible hardware-in-the-loop benchmarking

Featured work

Most of my production work is proprietary and lives in internal repositories. Public artifacts, writeups, and benchmark harnesses will land here as they become shareable.

Contact

aatishrana495@gmail.com · LinkedIn · GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aatish Rana aatishrana495

Achievements

Achievements

Highlights

Organizations

Block or report aatishrana495