Skip to content
View aatishrana495's full-sized avatar

Organizations

@auvnitrkl @NITR-Robosoccer-Students-Group

Block or report aatishrana495

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aatishrana495/README.md

Aatish Rana — ML systems, compilers, on-device AI

Aatish Rana

Senior AI/ML Systems Engineer  ·  on-device GenAI, ML compilers, and NPU deployment

Qualcomm  ·  IIT Madras (MTech, Industrial AI)  ·  NIT Rourkela (BTech, CSE)

Email   LinkedIn   GitHub


About

ML systems engineer working at the intersection of deep learning compilers, runtime systems, and hardware-aware optimization for edge ML accelerators. My day-to-day sits closer to the runtime than to the notebook — where model graphs, execution providers, and silicon meet.

Background blends production deployment engineering on Windows-on-Snapdragon with graduate-level study in Industrial AI at IIT Madras.

What I work on

  • On-device and edge AI deployment for transformer and diffusion workloads
  • Execution providers and runtime behavior across ONNX Runtime, QNN EP, DirectML, and WinML
  • Graph-level optimization, fusion, layout, and dtype legality on ONNX graphs
  • Kernel-level optimization along Conv / GEMM / attention / activation paths
  • Quantization and calibration (INT8 / INT4) for transformer-class models
  • Olive + WinML enablement pathways for on-device model delivery

Selected work

  • Contributed to enabling Stable Diffusion v1.5 on Windows-on-Snapdragon — the publicly-announced Qualcomm + Microsoft collaboration bringing on-device generative AI to the NPU.
  • Work on on-device GenAI and NPU-facing model enablement across Windows platforms.
  • Execution-provider and runtime integration across the ONNX Runtime / QNN EP / DirectML / WinML ecosystem.
  • Graph-level optimization, kernel tuning, and quantization practice applied to transformer and diffusion workloads.
  • Young Technocrat Award — external recognition.

Systems stack

Compilers & runtimes — ONNX Runtime, QNN EP, DirectML, WinML, Olive, ONNX, IR-level graph transformations
Hardware targets — Qualcomm NPUs (Hexagon / HTP), Snapdragon X, ARM64, x64
Models & frameworks — PyTorch, Hugging Face Transformers, diffusion models, quantization toolchains (INT8 / INT4)
Languages — C++, Python, C
Perf & debugging — Profiling, tracing, kernel-level analysis, hardware-in-the-loop benchmarking
Platforms — Windows-on-Snapdragon, Linux

Current focus

  • Deployment-time optimization for transformer and diffusion workloads on edge accelerators
  • Compile-time vs. runtime tradeoffs across ONNX Runtime execution providers
  • Mixed-precision (INT4 / INT8 / FP16) scheduling for transformer blocks
  • Reproducible hardware-in-the-loop benchmarking

Featured work

Most of my production work is proprietary and lives in internal repositories. Public artifacts, writeups, and benchmark harnesses will land here as they become shareable.

Contact

aatishrana495@gmail.com  ·  LinkedIn  ·  GitHub

Pinned Loading

  1. AUV-Simulator-Unity AUV-Simulator-Unity Public

    Forked from lafith/AUV-Simulator-Unity

    Simulator for Autonomous Underwater Vehicle, developed using Unity3D

    C#

  2. object_detection_yolov3 object_detection_yolov3 Public

    Python

  3. object_detection_yolov3_pytorch object_detection_yolov3_pytorch Public

    Python

  4. sgbro/HealthCare-EPITECT sgbro/HealthCare-EPITECT Public

    Python

  5. hydrophones_gui hydrophones_gui Public

    C++