We are looking for a Technical Program Manager (TPM) to drive cross-team execution for the inference platform as it scales in usage, regions, and complexity.
What You’ll Do
As a TPM for the AI Studio inference platform, you will own end-to-end delivery of complex, cross-functional initiatives that span infrastructure, platform engineering, hardware, and customer-facing teams.
You will:
Drive large, cross-team programs related to platform scaling, reliability, performance, and cost efficiency
Coordinate work across AI Studio engineers, Cloud Platform and Observability teams
Translate product and customer requirements (latency, throughput, SLAs, cost) into executable technical plans
Define clear scope, milestones, dependencies, and success metrics for multi-quarter initiatives
Unblock teams by driving decisions on architecture trade-offs, rollout strategies, and operational processes
Track and communicate risks, incidents, and dependencies to stakeholders at both engineering and leadership levels
Introduce and scale repeatable processes for launches, capacity planning, incident reviews, and platform changes
Support execution around model rollouts, autoscaling changes, GPU capacity expansion, and regional launches
What We Expect
5+ years of experience as a TPM (or equivalent role) leading cross-team technical programs
Strong technical foundation in cloud platforms, distributed systems, and production infrastructure
Practical understanding of Kubernetes-based platforms , service reliability, and observability (metrics, logs, traces)
Experience driving execution where you influence without formal authority
Ability to reason about system-level trade-offs (latency vs cost, reliability vs utilization)
Strong written and verbal communication skills; comfortable working with engineers and senior stakeholders
Analytical mindset with hands-on experience using data (SQL, Python, or scripting) to track progress and inform decisions
Nice to Have / Ways to Stand Out
Prior background as a Software Engineer, SRE, or Systems Engineer
Experience working with GPU-based workloads or high-throughput inference systems
Familiarity with LLM serving stacks (e.g. vLLM, TRTLLM) or ML platform environments
Experience running programs tied to capacity planning, autoscaling, or multi-region deployments
Exposure to environments operating under strict SLOs / SLAs and fast incident response loops
What we offer
Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Company.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.
We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!