Rajesh Thallam

Applied AI Engineer • AI Infrastructure & Generative AI

San Francisco Bay Area

Rajesh Thallam

About

Applied AI Engineer with 20+ years of experience specializing in large-scale AI infrastructure, Generative AI applications, and model evaluation bridging the gap between cutting-edge AI research and enterprise-scale production. Expertise spans designing distributed training and high-performance inference workloads across massive GPU and TPU clusters, to engineering AI agents and evaluation harnesses for many startups (including mine) and Fortune 500.

Highlights

  • Hyperscale AI Impact: Architecting and validating AI deployments on massive GPU and TPU clusters on Slurm and Kubernetes for startups and Fortune 500 clients.
  • Open Source Contributor: Active contributor and maintainer of Google Cloud repositories, LangChain, and LlamaIndex cloud integrations.
  • Technical Author: Authored official ML engineering playbooks and prompt engineering guides for Google Cloud Vertex AI and AI Hypercomputer.
  • Featured Speaker: Delivered technical sessions and workshops at major industry events, including Cloud Next, Google I/O, NVIDIA GTC, PyTorch Developers Conference, and Databricks Summit.

Work Experience

EnsureCare Co-Founder & CTO Healthcare AI Agents 2026 — Present

Co-Founder & CTO
  • Building an AI-powered care adherence platform for Healthcare operations to improve patient outcomes and optimize hospital utilization.

Google GPUs/TPUs Vertex AI Gemini vLLM/SGLang Agents GKE/Slurm 2019 — 2026

Senior Staff Software Engineer / ML Solutions Architect
  • [Large Scale AI Infrastructure]
    • Led the infrastructure setup, technical validation, and capacity unlocking for large-scale B200 and H100 GPU clusters on managed training services.
    • Engineered robust distributed checkpointing and optimized bare-metal workloads, transforming a transactional support relationship into a strategic multi-year cluster partnership with a top-tier enterprise.
  • [Training & Inference on AI Hypercomputer with Open Models]
    • Systematically benchmarked workloads utilizing roofline analysis to maximize TFLOPs utilization.
    • Published optimized recipes for multi-host disaggregated inference pipelines, delivering the first cloud deployment of DeepSeek R1 using SGLang, vLLM, NVIDIA Dynamo, Cloud Pathways, and JetStream.
    • Developed custom distributed training recipes for H100/H200/B200 GPUs, resolving precision conversion issues and overcoming guidance gaps to unblock significant recurring revenue for AI startups.
  • [Generative AI Applications & Agentic Workflows]
    • Spearheaded a leading cybersecurity firm’s production rollout of three GenAI copilots with Gemini, raising retrieval efficacy through advanced RAG optimizations.
    • Led an AutoDev initiative for a major collaboration software provider, achieving 77% accuracy on SWE-bench on Gemini 3.x for code generation.
    • Automated document understanding for a top fintech company, processing ~500M images/month.
  • [Model Evaluation & Quality Tooling]
    • Designed a unified model quality pipeline using AI agents to automate bug triage across Google DeepMind and Google Cloud. Processed 300+ 1P model bugs with 75% efficiency to enable a major foundation model’s early access launch.
    • Engineered an optimized evaluation harness for coding agents, surpassing published SWE-bench Verified results by Google DeepMind.
  • [Product Leadership & Go-To-Market]
    • Contributed technical strategy for the Vertex AI Training Clusters launch, resolving pre-GA friction points. Outperformed competitive bare-metal benchmarks by 15% to secure technical wins with enterprise research divisions.
  • [Open Source Ecosystem Orchestration]
    • Drove ~100K API requests/month by leading Day 1 ecosystem readiness and integration (LiteLLM, Pydantic-AI) for new foundation models.
    • Collaborated with research divisions to productionize the TimesFM model for a high-profile developer conference keynote demo.
  • [Large-Scale ML Inference & ML IaaS]
    • Served as technical lead for NVIDIA Triton integration, contributing to PRDs and official documentation to enable production deployments at major retail, financial, and tech enterprises.
    • Developed a custom GKE TPU operator and a JAX-to-FasterTransformer inference pipeline adopted by internal engineering teams.
  • [Technical Enablement & Evangelism]
    • Co-developed and delivered advanced training on AI infrastructure and Generative AI to 1000+ field engineers globally.

Amazon Web Services (AWS) SageMaker TensorFlow EMR Athena Kinesis 2017 — 2019

Big Data Architect / Data Scientist
  • [Professional Services] Deliver on-site technical engagements with partners and customers, including pre-sales visits, understanding customer requirements, creating consulting proposals, and creating packaged Big Data, Analytics, and Machine Learning service offerings.
  • [Machine Learning Optimization] Trained Deep Learning Convolutional Neural Networks on Amazon SageMaker and TensorFlow for Amazon.com customers’ packaging logistics, driving multi-million dollar estimated annual savings by choosing the right-sized shipping material.
  • [Data Platform Engineering] Designed a multi-service data analytics platform for an Industrial IoT customer utilizing EMR, Kinesis, Athena, Aurora, and SageMaker, processing 15B data points daily.

Kaiser Permanente Hadoop SAS/R Tableau 2014 — 2017

Technical lead and architect (consulting with Cognizant)
  • [Healthcare Analytics Platform] Designed SAS/R and Tableau-based analytics platforms to forecast member risk scores and built Hadoop-based data lakes.

JP Morgan & Chase Credit Risk Data Warehouse Analytics 2004 — 2014

Technical lead, developer and architect (consulting with Cognizant)
  • [Data Engineering & Analytics Architecture] Served as Lead Architect for JPMC’s industry-first Credit Risk processing platform. Built a 30TB Credit Risk data warehouse and a rapid exposure drill application that identified tens of millions in monthly risk exposure.

Education

University of California, Berkeley Master of Information & Data Science Osmania University Bachelor of Engineering in Electronics & Communication

Skills

AI Infrastructure Generative AI GPU & TPU Clusters Distributed Training High-Performance Inference Foundation Models Agents Evaluation vLLM & SGLang Vertex AI Amazon SageMaker JAX & PyTorch Slurm & Kubernetes Data Architecture