A

Senior Site Reliability Engineer, AI Research

Algolia
1 day ago
Full-time
Remote
Worldwide
Remote Engineering

At Algolia, we’re proud to be a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences at internet scale. Every week, we power over 30 billion search requests β€” four times more than Microsoft Bing, Yahoo, Baidu, Yandex, and DuckDuckGo combined.

In 2021, we raised $150 million in Series D funding, quadrupling our valuation to $2.25 billion. This strong foundation enables us to keep investing in our market-leading platform and serving incredible customers like Under Armour, PetSmart, Stripe, Gymshark, and Walgreens.

About the AI Research Team

The AI Research team at Algolia combines fundamental research with product engineering to deliver customer-facing AI-powered features.

The team is highly cross-functional, made up of PhD researchers, full-stack engineers, and infrastructure specialists working together to explore new ideas, validate impact, and bring successful research outcomes into production. While the work is research-driven, the output is real, customer-facing systems.

The Opportunity

We are looking for an embedded Senior Site Reliability Engineer to join the AI Research team as a full member of the group. In this role, you will support both the research and product-engineering aspects of the team by ensuring the stability, scalability, and operability of the infrastructure that enables this work.

This is a classic SRE role focused on cloud-first, service-oriented architectures running on Google Cloud Platform. While the team builds AI-powered systems, AI or ML experience is not required for this role. Our priority is strong SRE fundamentals, experience operating production services, and comfort working in an environment with ambiguity and high ownership.

You will play an important role in day-to-day execution as well as in longer-term (12-month) planning, helping shape how the team builds and operates its platforms over time.

What You’ll Work On

Platform Reliability & Enablement

  • Support and evolve the reliability of platforms used by the AI Research team. Examples of our infrastructure work to date include:
    • A production inference service (embedding model serving API)
    • AI data feature store
    • Internal tools used for novel research and experimentation<