C

Senior Site Reliability Engineer

Customerio
18 days ago
Full-time
Remote
Worldwide
Remote Engineering

About Customer.io

Over 8,000 companies - from scrappy startups to global brands - use our platform to send billions of emails, push notifications, in-app messages, and SMS every day. Customer.io powers automated communication that people actually want to receive. 

We help teams send smarter, more relevant messages using real-time behavioral data. Under the hood: Go, React, Ember and AI help us ship fast and scale with confidence.

We’re looking for a Site Reliability Engineer to help us scale our infrastructure, reduce operational toil, and increase reliability as we grow. If you’ve worked on high-scale systems and love making platforms better for developers and customers alike, we’d love to meet you.

What We Value

Ownership
You own problems end to end. You move fast, act like an owner, and thrive in ambiguity. You've led complex projects before, whether officially or not, and you're ready to do it again.

Engineers with product taste

You think like a user, not just an engineer. You think about performance, reliability, and how systems impact the customer experience.

A healthy skepticism for “the way things are done”
You bring rigor and creativity. Best practices matter - but never more than forward motion.

What You’ll Do

  • Build and scale infrastructure to support billions of messages per day and real-time events
  • Automate deployments, alerting, and incident response
  • Make our on-call better - clear alerts, solid documentation, and faster resolution
  • Tune MySQL and other datastore performance and improve reliability across distributed systems
  • Collaborate across teams to debug, ship, and support systems in production
  • Share knowledge and raise the bar through sharing your progress publicly with short videos, thoughtful writing, and mentorship
  • Leverage AI tools to prototype, move faster, and make better decisions

What we're looking for

  • 7+ years in SRE or infrastructure roles, improving production systems at scale
  • Deep MySQL experience - schema design, performance tuning, and operational tooling
  • Fluency in cloud-native tech (GCP a plus) and Terraform
  • Proficiency in Go and Bash for scripting and systems programming
  • Skill in observability, incident response, and debugging distributed systems
  • A preference for action over