9025 CVS Shared Services Resources LLC logo

Software Development Engineer - SRE

9025 CVS Shared Services Resources LLC
1 day ago
Full-time
Remote friendly (Work At Home-Texas United States of America)
Worldwide
Remote Engineering

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

Position Summary

We are seeking a Site Reliability Engineer with a strong focus on observability to design, implement, and operate monitoring and alerting solutions for mission-critical enterprise applications. This role will be responsible for building proactive, actionable observability across services, batch workloads, infrastructure, databases, and logs using tools such as Grafana, Prometheus, Loki, and Tempo. The ideal candidate is passionate about reliability engineering, signal-to-noise optimization, and enabling teams to detect and resolve issues before they impact customers.

Key Responsibilities

· Design and maintain a comprehensive observability platform using Grafana, Prometheus, Loki, and Tempo.

· Implement proactive monitoring and alerting for:

· Microservices and APIs (latency, error rates, availability)

· Batch jobs, scheduled workloads, and ETL/data pipelines (success/failure, duration, SLA adherence)

· Server and container health (CPU, memory, disk, network, capacity trends)

· Database health and performance (availability, replication, query latency, resource utilization)

· Application and infrastructure logging, including centralized log ingestion, indexing, and search.

· Build actionable alerts with clear runbooks, ownership, and escalation paths to minimize mean time to detect (MTTD) and mean time to resolve (MTTR).

· Partner with application, platform, and DevOps teams to instrument services with metrics, traces, and structured logs.

· Continuously improve signal quality by reducing alert noise, eliminating false positives, and optimizing thresholds based on historical trends.

· Create and maintain dashboards for real-time operational visibility and executive-level health reporting.

Support incident response and post-incident reviews by providing high-fidelity telemetry and contributing to root cause analysis.

Required Qualifications

· 5+ years of experience in Site Reliability Engineering, DevOps, or Production Operations.

· Hands-on expertise with Prometheus, Grafana, Loki, and Tempo in large-scale, production environments.

· Strong understanding of monitoring distributed systems spanning both On-Premises and Cloud environments (GCP, Azure).

· Experience defining SLOs/SLIs and building alerting strategies based on reliability engineering best practices.

· Exceptional attention to detail with the ability to think through complex systems end-to-end, anticipate edge cases, failure modes, and cascading impacts, and proactively design monitoring and alerting to cover both common and rare operational scenarios.

Education
Bachelor’s degree or, equivalent experience (HS diploma + 4 years relevant experience)

Anticipated Weekly Hours

40

Time Type

Full time

Pay Range

The typical pay range for this role is:

$72,100.00 - $144,200.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.


Additional details about available benefits are provided during the application process and on
Benefits Moments.

We anticipate the application window for this opening will close on: 04/28/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.