About

Hi, I’m Rajesh Thallam (Raj).

I am building EnsureCare, an agentic platform for healthcare operations. My work focuses on architecting large-scale AI infrastructure for training and inference, and building Generative AI agents.

Previously, I was a Senior Staff Software Engineer at Google, building large scale AI Infrastructure setups and running distributed training and inference workloads across massive GPU and TPU clusters on Slurm and Kubernetes. I spent my time solving engineering challenges at scale for foundation models and enterprise clients such as designing resilient systems, distributed checkpointing, roofline analysis for benchmarking, profiling to maximize TFLOPs utilization, and disaggregated serving.

Beyond my current work, I am an active open-source maintainer, with a background spanning engineering roles at AWS, JPMorgan Chase, and Kaiser Permanente.

Resume/CV

Feel free to connect with me through any of the platforms below: