cv – Rajesh Thallam

Rajesh Thallam

Applied AI Engineer • AI Infrastructure & Generative AI

San Francisco Bay Area

About

Applied AI Engineer with 20+ years of experience specializing in large-scale AI infrastructure, Generative AI applications, and model evaluation bridging the gap between cutting-edge AI research and enterprise-scale production. Expertise spans designing distributed training and high-performance inference workloads across massive GPU and TPU clusters, to engineering AI agents and evaluation harnesses for many startups (including mine) and Fortune 500.

Highlights

Hyperscale AI Impact: Architecting and validating AI deployments on massive GPU and TPU clusters on Slurm and Kubernetes for startups and Fortune 500 clients.
Open Source Contributor: Active contributor and maintainer of Google Cloud repositories, LangChain, and LlamaIndex cloud integrations.
Technical Author: Authored official ML engineering playbooks and prompt engineering guides for Google Cloud Vertex AI and AI Hypercomputer.
Featured Speaker: Delivered technical sessions and workshops at major industry events, including Cloud Next, Google I/O, NVIDIA GTC, PyTorch Developers Conference, and Databricks Summit.

Work Experience

EnsureCare Co-Founder & CTO Healthcare AI Agents 2026 — Present

Co-Founder & CTO

Building an AI-powered care adherence platform for Healthcare operations to improve patient outcomes and optimize hospital utilization.

Google GPUs/TPUs Vertex AI Gemini vLLM/SGLang Agents GKE/Slurm 2019 — 2026

Senior Staff Software Engineer / ML Solutions Architect

[Large Scale AI Infrastructure]
- Led the infrastructure setup, technical validation, and capacity unlocking for large-scale B200 and H100 GPU clusters on managed training services.
- Engineered robust distributed checkpointing and optimized bare-metal workloads, transforming a transactional support relationship into a strategic multi-year cluster partnership with a top-tier enterprise.
[Training & Inference on AI Hypercomputer with Open Models]
- Systematically benchmarked workloads utilizing roofline analysis to maximize TFLOPs utilization.
- Published optimized recipes for multi-host disaggregated inference pipelines, delivering the first cloud deployment of DeepSeek R1 using SGLang, vLLM, NVIDIA Dynamo, Cloud Pathways, and JetStream.
- Developed custom distributed training recipes for H100/H200/B200 GPUs, resolving precision conversion issues and overcoming guidance gaps to unblock significant recurring revenue for AI startups.
[Generative AI Applications & Agentic Workflows]
- Spearheaded a leading cybersecurity firm’s production rollout of three GenAI copilots with Gemini, raising retrieval efficacy through advanced RAG optimizations.
- Led an AutoDev initiative for a major collaboration software provider, achieving 77% accuracy on SWE-bench on Gemini 3.x for code generation.
- Automated document understanding for a top fintech company, processing ~500M images/month.
[Model Evaluation & Quality Tooling]
- Designed a unified model quality pipeline using AI agents to automate bug triage across Google DeepMind and Google Cloud. Processed 300+ 1P model bugs with 75% efficiency to enable a major foundation model’s early access launch.
- Engineered an optimized evaluation harness for coding agents, surpassing published SWE-bench Verified results by Google DeepMind.
[Product Leadership & Go-To-Market]
- Contributed technical strategy for the Vertex AI Training Clusters launch, resolving pre-GA friction points. Outperformed competitive bare-metal benchmarks by 15% to secure technical wins with enterprise research divisions.
[Open Source Ecosystem Orchestration]
- Drove ~100K API requests/month by leading Day 1 ecosystem readiness and integration (LiteLLM, Pydantic-AI) for new foundation models.
- Collaborated with research divisions to productionize the TimesFM model for a high-profile developer conference keynote demo.
[Large-Scale ML Inference & ML IaaS]
- Served as technical lead for NVIDIA Triton integration, contributing to PRDs and official documentation to enable production deployments at major retail, financial, and tech enterprises.
- Developed a custom GKE TPU operator and a JAX-to-FasterTransformer inference pipeline adopted by internal engineering teams.
[Technical Enablement & Evangelism]
- Co-developed and delivered advanced training on AI infrastructure and Generative AI to 1000+ field engineers globally.

Amazon Web Services (AWS) SageMaker TensorFlow EMR Athena Kinesis 2017 — 2019

Big Data Architect / Data Scientist

[Professional Services] Deliver on-site technical engagements with partners and customers, including pre-sales visits, understanding customer requirements, creating consulting proposals, and creating packaged Big Data, Analytics, and Machine Learning service offerings.
[Machine Learning Optimization] Trained Deep Learning Convolutional Neural Networks on Amazon SageMaker and TensorFlow for Amazon.com customers’ packaging logistics, driving multi-million dollar estimated annual savings by choosing the right-sized shipping material.
[Data Platform Engineering] Designed a multi-service data analytics platform for an Industrial IoT customer utilizing EMR, Kinesis, Athena, Aurora, and SageMaker, processing 15B data points daily.

Kaiser Permanente Hadoop SAS/R Tableau 2014 — 2017

Technical lead and architect (consulting with Cognizant)

[Healthcare Analytics Platform] Designed SAS/R and Tableau-based analytics platforms to forecast member risk scores and built Hadoop-based data lakes.

JP Morgan & Chase Credit Risk Data Warehouse Analytics 2004 — 2014

Technical lead, developer and architect (consulting with Cognizant)

[Data Engineering & Analytics Architecture] Served as Lead Architect for JPMC’s industry-first Credit Risk processing platform. Built a 30TB Credit Risk data warehouse and a rapid exposure drill application that identified tens of millions in monthly risk exposure.

Education

University of California, Berkeley Master of Information & Data Science Osmania University Bachelor of Engineering in Electronics & Communication

Skills

AI Infrastructure Generative AI GPU & TPU Clusters Distributed Training High-Performance Inference Foundation Models Agents Evaluation vLLM & SGLang Vertex AI Amazon SageMaker JAX & PyTorch Slurm & Kubernetes Data Architecture