Remote Senior Devops Engineer
Oracle
DescriptionKey Responsibilities
- Design, implement, and automate ML lifecycle workflows using tools like MLflow, Kubeflow, Airflow and OCI Data Science Pipelines.
- Build and maintain CI/CD pipelines for model training, validation, and deployment using GitHub Actions, Jenkins, or Argo Workflows.
- Collaborate with data engineers to deploy models within modern data lakehouse architectures (e.g., Apache Iceberg, Delta Lake, Apache Hudi).
- Integrate machine learning frameworks such as TensorFlow, PyTorch, and Scikit-learn into distributed environments like Apache Spark, Ray, or Dask.
- Operationalize model tracking, versioning, and drift detection using DVC, model registries, and ML metadata stores.
- Manage infrastructure as code (IaC) using tools like Terraform, Helm, or Ansible to support dynamic GPU/CPU training clusters.
- Configure real-time and batch data ingestion and feature transformation pipelines using Kafka, Goldengate and OCI Streaming.
- Collaborate with DevOps and platform teams to implement robust monitoring, observability, and alerting with tools like Prometheus, Grafana, and the ELK Stack.
- Support AI governance by enabling model explainability, audit logging, and compliance mechanisms aligned with enterprise data and security policies.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related technical discipline.
-
5–8 years of experience in ML engineering, DevOps, or data platform engineering, with at least 2 years in MLOps or model operations.
- Proficiency in Python, particularly for automation, data processing, and ML model development.
- Solid experience with SQL and distributed query engines (e.g., Trino, Spark SQL).
- Deep expertise in Docker, Kubernetes, and cloud-native container orchestration tools (e.g., OCI Container Engine, EKS, GKE).
- Working knowledge of open-source data lakehouse frameworks and data versioning tools (e.g., Delta Lake, Apache Iceberg, DVC).
- Familiarity with model deployment strategies, including batch, real-time inference, and edge deployments.
- Experience with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins) and MLOps frameworks (Kubeflow, MLflow, Seldon Core).
- Competence in implementing monitoring and logging systems (e.g., Prometheus, ELK Stack, Datadog) for ML applications.
- Strong understanding of cloud platforms (OCI, AWS, GCP) and IaC tools (Terraform, CloudFormation).
Preferred Qualifications
- Experience integrating AI workflows with Oracle Data Lakehouse, Databricks, or Snowflake.
- Hands-on experience with orchestration tools like Apache Airflow, Prefect, or Dagster.
- Exposure to real-time ML systems using Kafka or Oracle Stream Analytics.
- Understanding of vector databases (e.g., Oracle 23ai Vector Search).
- Knowledge of AI governance, including model explainability, auditability, and reproducibility frameworks.
Soft Skills
- Strong problem-solving skills and an automation-first mindset.
- Excellent cross-functional communication, especially when collaborating with data scientists, DevOps, and platform engineering teams.
- A collaborative and knowledge-sharing attitude, with good documentation habits.
- Passion for continuous learning, especially in the areas of AI/ML tooling, open-source platforms, and data engineering innovation.
QualificationsCareer Level - IC4