Senior Site Reliability Engineer (SRE & Platform Reliability)

Affirm

10 days ago

Full-time

Remote

Worldwide

Remote Engineering

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps our Engineering partners to “Operate What They Own” with excellence to protect their customers’ experience. SRE accomplishes this through defining frameworks and best practices for operating applications, building tooling, and providing training and consulting. Some of the many SRE responsibilities are:

Providing data and visibility to teams and leadership on application performance
Guiding the development of SLOs
Driving the Incident Management and Analysis process
Steering the implementation of Change Management and Deployment practices
Engaging in service and architectural conversations
Recommending observability and alerting configurations

The SRE team benefits from experience across many domains including:

infrastructure, platform, and distributed systems
capacity management, load and chaos testing
automation, observability, and configuration management
development and product experience

The SRE team is seeking motivated software and systems engineers with the experience to build, iterate on, and expand incident lifecycle, reliability, and resilience practices throughout Affirms Engineering organization and beyond.

What You'll Do:

You will be responsible for owning and delivering quarterly goals for your team, leading engineers on your team through ambiguity to solve open-ended problems, and ensuring that everyone is supported throughout delivery.
You will support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs.
You will proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis.
You will support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts.
You will foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating f

Apply now

Senior Site Reliability Engineer (SRE & Platform Reliability)

More jobs

Software Architect - AI Trainer - Freelance - 8-20hrs/week - Remote

10Xteam

Platform Engineer - AI Trainer - Freelance - 8-20hrs/week - Remote

10Xteam