S

Machine Learning Ops Engineer

Stratum Ai
12 months ago
Full-time
Remote
Worldwide
Remote Data
We are looking for a high-agency Machine Learning Ops Engineer to join our Infrastructure Team. You will help build and maintain the platform used to train, evaluate, and serve our AI models to clients in the mining industry. Your work will directly support our Technical Services and Platform teams in delivering solutions that create value for mining clients.

This position requires strong expertise in Python and machine learning workflows. You will work alongside a team of three engineers focused on creating robust infrastructure and tooling.

This is a remote-first position based in Canada.


KEY RESPONSIBILITIES

- Develop robust and well-tested code for core internal tools:

- Create data preprocessing modules for mining data

- Implement metrics calculations and evaluation pipelines

- Build visualization tools for 3D models and ML performance metrics

- Troubleshoot and fix issues in existing metrics code

- Build and maintain our custom end-to-end MLOps platform:

- Implement experiment tracking systems

- Create model registry with versioning and storage

- Develop automated testing frameworks

- Build interfaces between different components of the ML pipeline

- Develop production-grade QA/QC systems for deployed AI models:

- Implement input data validation

- Create automated alerts for performance issues

- Set up monitoring for data drift

- Build dashboards for model performance metrics

- Create specialized tools for mining data:

- Implement spatial data processing utilities

- Build visualization tools for 3D geological data

- Develop data converters between different mining data formats

- Create utilities for coordinate transformations

- Refactor and productionize code created by the client services team:

- Convert notebooks into modular Python packages

- Implement proper error handling and logging

- Add comprehensive testing to existing code

- Improve performance of data processing pipelines

- Provide technical expertise to the client services team

- Manage infrastructure for data processing, model training, and serving

- Mentor junior engineers, perform code reviews, and write documentation
Proactively identify technical challenges and drive improvement initiatives


TECHNICAL COMPETENCIES & REQUIREMENTS

- Bachelor's degree in Computer Science, Engineering, or related fields OR equivalent experience in software development and ML engineering

- 3+ years of industry experience
Kubernetes, PyTorch

- Advanced Python programming skills:

- Proficiency with data science libraries (numpy, pandas)

- Experience with visualization tools

- Ability to write modular, robust, and tested Python code

- Strong debugging skills for complex ML systems

- Deep learning experience:

- Implementation of neural network models and training workflows

- Understandi