Descripción de la oferta
Descripción del trabajo
Reports To: CTO
About the Role
This organisation operates GPU cloud infrastructure across bare metal, managed cloud, and serverless inference. The platform engineering team you will lead is responsible for building the next generation of the platform stack: starting from the existing cloud platform as the foundation and extending it to support IaaS, PaaS, and SaaS capabilities over time.
You will report to the CTO and work closely with the product organisation, translating roadmap requirements into shipped, production‑grade platform capabilities. A counterpart team lead role is planned for the Bay Area office; you and that lead will coordinate regularly on architecture, roadmap, and cross‑team delivery. The role requires someone equally comfortable setting technical direction, writing production code, and managing a team.
What You'll Be Doing
Lead the platform engineering team: technical direction, design reviews, code review, and day‑to‑day delivery.
Own the platform roadmap together with product. Scope and sequence work, communicate trade‑offs, and keep delivery on track.
Design and build the platform layers that underpin IaaS, and progressively extend them to support PaaS and SaaS offerings.
Coordinate regularly with the Bay Area platform lead on shared architecture, joint initiatives, and cross‑timezone delivery.
Own reliability, observability, and security standards for the services your team ships.
Stay hands‑on: take on complex or high‑risk engineering work directly alongside the team.
Run post‑mortems and close structural gaps after incidents, not just the immediate issue.
Manage the team: 1 : 1s, performance conversations, hiring interviews, and onboarding.
Must‑Have Requirements
5+ years in cloud platform or infrastructure engineering, with at least 1 year leading an engineering team.
Kubernetes at depth: cluster operations, CNI networking, scheduling, and workload management in production.
GPU infrastructure experience: provisioning, resource allocation, and surfacing capacity through a platform or API layer.
Solid networking fundamentals: L2 / L3, VLANs, overlay networks, and multi‑tenant isolation in cloud environments.
Service architecture skills: designing and operating production‑grade control‑plane services and APIs across IaaS and above.
Security awareness: you think about trust boundaries, least privilege, and secure defaults when building platform services.
People management track record: 1 : 1s, performance feedback, and accountability for a team's delivery and wellbeing.
Nice‑to‑Have (But Not Essential)
Experience building SaaS or PaaS layers on top of an IaaS platform.
Familiarity with RDMA, InfiniBand, or RoCE networking in GPU or HPC clusters.
Background with OpenStack or other IaaS orchestration platforms.
Experience working distributed across timezones, particularly with a US West Coast counterpart.
Exposure to serverless or inference serving infrastructure.
#J-18808-Ljbffr