Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Sully.ai

Full-time

Remote

United States

Remote AI

About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do

LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.
Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.
DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.
Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.
Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For

Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.
Deep understanding of GPU architectures, inference optimization, and large model serving techniques.
Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.
Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).
Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.
(Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.
(Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us

Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.
Work with bleeding-edge GPU infrastructure and build systems that push what's possible.
Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.
Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment.

Apply now

Share this job

Twitter Facebook Linkedin Email

Senior AI Systems Engineer (LLM Inference & Infra Optimization)

About Us

The Role

What You’ll Do

What We’re Looking For

Why Join Us

More jobs

Appian Software Developer - Remote

Bigbear.ai

Azure AI Infrastructure Specialist

Peraton