Storage Engineering Manager
💰 $170,000 – $240,000/yr
Job Description
About Lambda
Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers worldwide. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. We're building the world's best AI cloud infrastructure.
About the Role
We are seeking a seasoned Storage Engineering Manager with extensive experience in the specification, evaluation, deployment, and management of HPC storage solutions across multiple datacenters. You will hire and guide a team of storage engineers in building storage infrastructure that serves our AI/ML infrastructure products, ensuring seamless deployment and operational excellence of both physical and logical storage infrastructure—including proprietary and open source solutions.
Your role transcends people management; you will serve as the ultimate technical and operational authority for our high-performance, petabyte-scale storage solutions. Your leadership will be pivotal in ensuring our systems are reliable, scalable, and manageable as we grow toward exascale capabilities.
What You'll Own
Engineering at Lambda is responsible for building and scaling our cloud offering across website, cloud APIs, systems, and internal tooling for deployment, management, and maintenance. The Lambda Infrastructure Engineering organization forges the foundation of high-performance AI clusters by integrating the latest in AI storage, networking, GPU, and CPU hardware.
In distributed AI infrastructure, raw GPU and CPU horsepower is only part of the equation. High-performance networking and storage are critical components that enable these systems, making groundbreaking AI training and inference possible. Your expertise will span:
- High-Performance Distributed Storage Solutions and Protocols: Engineer protocols and systems that serve massive datasets at the speeds demanded by modern clustered GPUs
- Dynamic Networking: Design advanced networks providing multi-tenant security and intelligent routing without compromising performance, using latest AI networking hardware
- Compute Virtualization: Enable cutting-edge virtualization and clustering that allows AI researchers to focus on workloads, not infrastructure
Location & Work Arrangement
This position requires presence in our San Francisco/San Jose/Bellevue office location 4 days per week. Lambda's designated work-from-home day is currently Tuesday. This is a unique opportunity to work at the intersection of large-scale distributed systems and the rapidly evolving field of artificial intelligence infrastructure.