Storage Engineering Manager
Lambda
Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU.
If you'd like to build the world's best AI cloud, join us.
*Note: This position requires presence in our San Francisco/San Jose/Bellevue office location 4 days per week; Lambdaβs designated work from home day is currently Tuesday.
Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.
In the world of distributed AI, raw GPU and CPU horsepower is just a part of the story. High-performance networking and storage are the critical components that enable and unite these systems, making groundbreaking AI training and inference possible.
The Lambda Infrastructure Engineering organization forges the foundation of high-performance AI clusters by welding together the latest in AI storage, networking, GPU and CPU hardware.
Our expertise lies at the intersection of:
- High-Performance Distributed Storage Solutions and Protocols: We engineer the protocols and systems that serve massive datasets at the speeds demanded by modern clustered GPUs.
- Dynamic Networking: We design advanced networks that provide multi-tenant security and intelligent routing without compromising performance, using the latest in AI networking hardware.
- Compute Virtualization: We enable cutting-edge virtualization and clustering that allows AI researchers and engineers to focus on AI workloads, not AI infrastructure, unleashing the full compute bandwidth of clustered GPUs.
About the Role:
We are seeking a seasoned Storage Engineering Manager with experience in the specification, evaluation, deployment, and management of HPC storage solutions across multiple datacenters to build out a world-class team. You will hire and guide a team of storage engineers in building storage infrastructure that serves our AI/ML infrastructure products, ensuring the seamless deployment and operational excellence of both the physical and logical storage infrastructure (including proprietary and open source solutions).
Your role is not just to manage people, but to serve as the ultimate technical and operational authority for our high-performance, petabyte-scale storage solutions.Your leadership will be pivotal in ensuring our systems are not just high-performing, but also reliable, scalable, and manageable as we grow toward exascale.
This is a unique opportunity to work at the intersection of large-scale distributed systems and the rapidly evolving field of artificial intelligence infrastructure. This is an opportunity to have a significant impact on the future of AI. You will be building the foundational