Senior Production Engineer
Remote, United States
About this Position:
Are you passionate about automation, cloud infrastructure, Kubernetes, and reliability engineering? As a Senior Production Engineer (SRE) at Legion, you will build and operate a secure, highly scalable, and cost-effective AWS/Kubernetes-based cloud platform. You will work across infrastructure automation, CI/CD pipelines, observability, and production reliability. Simply put, the SRE team ensures Legionβs platform is reliable, scalable, and continuously improving for our customers.
This role includes participation in an on-call rotation.
Role Responsibilities:
- Support and operate Legionβs AWS-based cloud platform and Kubernetes (EKS) environments.
- Leverage GenAI tools (e.g., Claude Code, Codex, or similar) to accelerate infrastructure development, automation, and auto-remediation of common production issues.
- Build and maintain infrastructure-as-code using Terraform.
- Develop automation and internal tooling using Go or Python.
- Improve CI/CD pipelines to increase deployment safety and velocity.
- Define and improve monitoring, alerting, and observability systems.
- Respond to production incidents, conduct root cause analysis, and implement systemic improvements.
- Develop and automate operational runbooks and remediation workflows.
- Support production deployments, including during off-hours as needed.
Basic Qualifications:
- 5-8+ years of experience in SRE, DevOps, or SaaS production operations.
- 3+ years of hands-on experience operating production workloads in AWS.
- Strong experience with Terraform and infrastructure-as-code practices.
- 3+ years of experience with containerized environments using Docker and Kubernetes (EKS preferred); familiarity with Helm.
- Proficiency in Go or Python (or similar programming language).
- Experience building and maintaining CI/CD systems (Git-based workflows, Argo, Jenkins or similar).
- Strong Linux/Unix systems experience.
- Bachelorβs degree in Computer Science or equivalent practical experience.
Other Qualifications:Β
- Experience with observability tools such as Datadog, CloudWatch, ELK stack, Prometheus, or similar.
- Experience managing AWS RDS and/or Aurora MySQL including slow query analysis, replication, and upgrade operations.
-