Applied AI Engineer • AI Infrastructure & Generative AI
San Francisco Bay Area
![]()
About
Applied AI Engineer with 20+ years of experience specializing in large-scale AI infrastructure, Generative AI applications, and model evaluation bridging the gap between cutting-edge AI research and enterprise-scale production. Expertise spans designing distributed training and high-performance inference workloads across massive GPU and TPU clusters, to engineering AI agents and evaluation harnesses for many startups (including mine) and Fortune 500.
Highlights
- Hyperscale AI Impact: Architecting and validating AI deployments on massive GPU and TPU clusters on Slurm and Kubernetes for startups and Fortune 500 clients.
- Open Source Contributor: Active contributor and maintainer of Google Cloud repositories, LangChain, and LlamaIndex cloud integrations.
- Technical Author: Authored official ML engineering playbooks and prompt engineering guides for Google Cloud Vertex AI and AI Hypercomputer.
- Featured Speaker: Delivered technical sessions and workshops at major industry events, including Cloud Next, Google I/O, NVIDIA GTC, PyTorch Developers Conference, and Databricks Summit.
Work Experience
EnsureCare 2026 — Present
- Building an AI-powered care adherence platform for Healthcare operations to improve patient outcomes and optimize hospital utilization.
Google 2019 — 2026
-
[Large Scale AI Infrastructure]
- Led the infrastructure setup, technical validation, and capacity unlocking for large-scale B200 and H100 GPU clusters on managed training services.
- Engineered robust distributed checkpointing and optimized bare-metal workloads, transforming a transactional support relationship into a strategic multi-year cluster partnership with a top-tier enterprise.
-
[Training & Inference on AI Hypercomputer with Open Models]
- Systematically benchmarked workloads utilizing roofline analysis to maximize TFLOPs utilization.
- Published optimized recipes for multi-host disaggregated inference pipelines, delivering the first cloud deployment of DeepSeek R1 using SGLang, vLLM, NVIDIA Dynamo, Cloud Pathways, and JetStream.
- Developed custom distributed training recipes for H100/H200/B200 GPUs, resolving precision conversion issues and overcoming guidance gaps to unblock significant recurring revenue for AI startups.
-
[Generative AI Applications & Agentic Workflows]
- Spearheaded a leading cybersecurity firm’s production rollout of three GenAI copilots with Gemini, raising retrieval efficacy through advanced RAG optimizations.
- Led an AutoDev initiative for a major collaboration software provider, achieving 77% accuracy on SWE-bench on Gemini 3.x for code generation.
- Automated document understanding for a top fintech company, processing ~500M images/month.
-
[Model Evaluation & Quality Tooling]
- Designed a unified model quality pipeline using AI agents to automate bug triage across Google DeepMind and Google Cloud. Processed 300+ 1P model bugs with 75% efficiency to enable a major foundation model’s early access launch.
- Engineered an optimized evaluation harness for coding agents, surpassing published SWE-bench Verified results by Google DeepMind.
-
[Product Leadership & Go-To-Market]
- Contributed technical strategy for the Vertex AI Training Clusters launch, resolving pre-GA friction points. Outperformed competitive bare-metal benchmarks by 15% to secure technical wins with enterprise research divisions.
-
[Open Source Ecosystem Orchestration]
- Drove ~100K API requests/month by leading Day 1 ecosystem readiness and integration (LiteLLM, Pydantic-AI) for new foundation models.
- Collaborated with research divisions to productionize the TimesFM model for a high-profile developer conference keynote demo.
-
[Large-Scale ML Inference & ML IaaS]
- Served as technical lead for NVIDIA Triton integration, contributing to PRDs and official documentation to enable production deployments at major retail, financial, and tech enterprises.
- Developed a custom GKE TPU operator and a JAX-to-FasterTransformer inference pipeline adopted by internal engineering teams.
-
[Technical Enablement & Evangelism]
- Co-developed and delivered advanced training on AI infrastructure and Generative AI to 1000+ field engineers globally.
Amazon Web Services (AWS) 2017 — 2019
- [Professional Services] Deliver on-site technical engagements with partners and customers, including pre-sales visits, understanding customer requirements, creating consulting proposals, and creating packaged Big Data, Analytics, and Machine Learning service offerings.
- [Machine Learning Optimization] Trained Deep Learning Convolutional Neural Networks on Amazon SageMaker and TensorFlow for Amazon.com customers’ packaging logistics, driving multi-million dollar estimated annual savings by choosing the right-sized shipping material.
- [Data Platform Engineering] Designed a multi-service data analytics platform for an Industrial IoT customer utilizing EMR, Kinesis, Athena, Aurora, and SageMaker, processing 15B data points daily.
Kaiser Permanente 2014 — 2017
- [Healthcare Analytics Platform] Designed SAS/R and Tableau-based analytics platforms to forecast member risk scores and built Hadoop-based data lakes.
JP Morgan & Chase 2004 — 2014
- [Data Engineering & Analytics Architecture] Served as Lead Architect for JPMC’s industry-first Credit Risk processing platform. Built a 30TB Credit Risk data warehouse and a rapid exposure drill application that identified tens of millions in monthly risk exposure.
Education
University of California, Berkeley Master of Information & Data Science Osmania University Bachelor of Engineering in Electronics & Communication
Skills
AI Infrastructure Generative AI GPU & TPU Clusters Distributed Training High-Performance Inference Foundation Models Agents Evaluation vLLM & SGLang Vertex AI Amazon SageMaker JAX & PyTorch Slurm & Kubernetes Data Architecture