Remote Senior Devops Engineer

Oracle

5 months ago

Full-time

Remote

Mexico

Remote Engineering

Description

Key Responsibilities

Design, implement, and automate ML lifecycle workflows using tools like MLflow, Kubeflow, Airflow and OCI Data Science Pipelines.
Build and maintain CI/CD pipelines for model training, validation, and deployment using GitHub Actions, Jenkins, or Argo Workflows.
Collaborate with data engineers to deploy models within modern data lakehouse architectures (e.g., Apache Iceberg, Delta Lake, Apache Hudi).
Integrate machine learning frameworks such as TensorFlow, PyTorch, and Scikit-learn into distributed environments like Apache Spark, Ray, or Dask.
Operationalize model tracking, versioning, and drift detection using DVC, model registries, and ML metadata stores.
Manage infrastructure as code (IaC) using tools like Terraform, Helm, or Ansible to support dynamic GPU/CPU training clusters.
Configure real-time and batch data ingestion and feature transformation pipelines using Kafka, Goldengate and OCI Streaming.
Collaborate with DevOps and platform teams to implement robust monitoring, observability, and alerting with tools like Prometheus, Grafana, and the ELK Stack.
Support AI governance by enabling model explainability, audit logging, and compliance mechanisms aligned with enterprise data and security policies.

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Science, or a related technical discipline.
5–8 years of experience in ML engineering, DevOps, or data platform engineering, with at least 2 years in MLOps or model operations.
Proficiency in Python, particularly for automation, data processing, and ML model development.
Solid experience with SQL and distributed query engines (e.g., Trino, Spark SQL).
Deep expertise in Docker, Kubernetes, and cloud-native container orchestration tools (e.g., OCI Container Engine, EKS, GKE).
Working knowledge of open-source data lakehouse frameworks and data versioning tools (e.g., Delta Lake, Apache Iceberg, DVC).
Familiarity with model deployment strategies, including batch, real-time inference, and edge deployments.
Experience with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins) and MLOps frameworks (Kubeflow, MLflow, Seldon Core).
Competence in implementing monitoring and logging systems (e.g., Prometheus, ELK Stack, Datadog) for ML applications.
Strong understanding of cloud platforms (OCI, AWS, GCP) and IaC tools (Terraform, CloudFormation).

Preferred Qualifications

Experience integrating AI workflows with Oracle Data Lakehouse, Databricks, or Snowflake.
Hands-on experience with orchestration tools like Apache Airflow, Prefect, or Dagster.
Exposure to real-time ML systems using Kafka or Oracle Stream Analytics.
Understanding of vector databases (e.g., Oracle 23ai Vector Search).
Knowledge of AI governance, including model explainability, auditability, and reproducibility frameworks.

Soft Skills

Strong problem-solving skills and an automation-first mindset.
Excellent cross-functional communication, especially when collaborating with data scientists, DevOps, and platform engineering teams.
A collaborative and knowledge-sharing attitude, with good documentation habits.
Passion for continuous learning, especially in the areas of AI/ML tooling, open-source platforms, and data engineering innovation.

More jobs