logo

View all jobs

Machine Learning Engineer – MLOps Lead

New Jersey, NJ · Information Technology
Job Title: Machine Learning Engineer – MLOps Lead
Duration: Contract role
Location: Remote, United States


Role Mission
You are being hired to productionize machine learning at scale — eliminating fragile pilot models, building hardened MLOps pipelines, and delivering compliant, monitored, and continuously improving ML systems that directly support business operations.
Your success is measured not by “knowing tools,” but by deploying, stabilizing, and scaling real ML systems in production.

First-Year Outcomes (What You Must Deliver)
Within First 30 Days
  • Fully assess current ML pipelines, data flows, and deployment architecture
  • Identify top 3 reliability, security, and performance risks in current ML lifecycle
  • Produce a documented MLOps modernization roadmap

Within 90 Days
You will:
  • Stand up standardized CI/CD pipelines for model training, validation, and deployment
  • Implement automated monitoring, alerting, and versioning across active production models
  • Deploy at least one business-critical ML model into hardened production pipelines
  • Establish security, audit, and compliance controls for model governance
  • Reduce model deployment cycle time by 30–50%

Within 180 Days
You will:
  • Operate a fully standardized enterprise MLOps framework (MLflow/Kubeflow/Airflow based)
  • Enable continuous retraining and automated rollback capability
  • Achieve ≥ 99.5% model uptime
  • Establish retraining cadence that improves model accuracy and reliability quarter-over-quarter
  • Mentor junior engineers and codify ML engineering standards

Ongoing Success Metrics
Metric Target
  • Production model uptime
  • ≥ 99.5%
  • Model deployment cycle time
  • ↓ 30–50%
  • Automated pipeline coverage
  • 100%
  • Compliance audit readiness
  • Continuous
  • Model accuracy improvement
  • QoQ measurable gains

What You Will Build
  • End-to-end MLOps pipelines (data → training → testing → deployment → monitoring → retraining)
  • Kubernetes-based model serving platforms
  • Cloud ML platforms (Vertex AI / SageMaker / Azure ML)
  • CI/CD automation for ML systems
  • Model observability and alerting using Prometheus / Grafana
  • Secure, version-controlled ML governance frameworks

Required Experience (Performance Evidence)
You must have:
  • Proven delivery of production ML pipelines (not just experiments)
  • Built CI/CD for ML models in Kubernetes environments
  • Implemented monitoring, retraining, and version governance
  • Delivered at least one enterprise-scale ML deployment
  • Hands-on experience with MLflow / Kubeflow / Airflow
  • Cloud ML production deployment (AWS, GCP, or Azure)
  • Strong Python engineering background

Share This Job

Powered by