Senior Systems Engineer

Chicago, IL

Position: Senior Systems Engineer – Kubernetes & Cloud Platform
Location: Chicago, IL (Hybrid)
Employment Type: Contract with potential to convert to Full-Time

Role Overview
Reporting to the Director of IT, this Senior Systems Engineer will play a lead role in architecting, deploying, and managing enterprise-grade Kubernetes environments across AWS and GCP. The ideal candidate will be an expert in Kubernetes administration and cloud-native system design, with deep hands-on experience in platform automation, CI/CD pipelines, and service mesh technologies such as Istio. This role is critical in ensuring reliability, scalability, and performance of cloud-based infrastructure supporting critical applications.

Employee Value Proposition
Purpose: This role enables the organization to modernize and future-proof its infrastructure by driving the design and automation of Kubernetes platforms in multi-cloud environments. Your work will directly impact platform resilience, scalability, and developer productivity.
Growth: As a technical SME, you will gain exposure to cutting-edge cloud-native tools and multi-cloud strategies, positioning yourself as a leader in the evolving field of platform engineering. There is a clear path for this contract position to transition into a full-time role based on performance.
Motivators: You will be empowered to lead platform innovation, evaluate and integrate emerging tools, and build solutions that automate and streamline deployment and monitoring. Your contributions will shape a mission-critical foundation for development teams and business growth.

Major Objectives

Design and Deploy Production-Grade Kubernetes Clusters in AWS and GCP
Within the first 60 days, take ownership of the architecture, deployment, and configuration of secure, scalable Kubernetes clusters across AWS and GCP. This includes provisioning, role-based access control, network policies, and monitoring setup. Success will be measured by a fully operational and documented environment ready for production workloads.
Establish a CI/CD Pipeline and GitOps Deployment Workflow
Within the first 90 days, implement a fully automated CI/CD pipeline integrated with tools like ArgoCD, Helm, and GitHub Actions or similar. Ensure the deployment process supports multiple environments and rollback mechanisms. KPIs include deployment frequency, recovery time, and reduced manual intervention.
Lead Platform Automation and Observability Enhancements
By the end of the second quarter, develop infrastructure-as-code (IaC) modules using Terraform or similar, and integrate observability tooling including Prometheus, Grafana, and distributed tracing. Key outcomes include reduced manual configuration and increased visibility into cluster performance.

Critical Subtasks

Evaluate the Current Cloud Footprint and K8s Infrastructure
In the first 30 days, conduct a comprehensive assessment of existing Kubernetes environments and associated AWS/GCP workloads. Provide a gap analysis and prioritize opportunities for improvement.
Harden Kubernetes Cluster Security and Networking
Within 45 days, implement best practices for container and cluster security using tools like Calico for network policy, and IAM integrations for secure access control. Document policies and provide a compliance checklist.
Integrate and Manage Istio Service Mesh
By Day 60, deploy and configure Istio to support service discovery, traffic management, and security policies. Validate with test workloads and monitor performance.
Provide Technical Leadership and SME Support to Development Teams
On an ongoing basis, serve as the subject matter expert for platform operations, answering design and deployment questions, reviewing Helm charts, and collaborating on incident resolution.
Support Multi-Cloud Deployment Readiness and Testing
Within the first 120 days, ensure that all platform components are tested and validated for multi-cloud compatibility. Document any GCP-specific configurations alongside AWS baselines.
Drive Continuous Improvements in Deployment Velocity and System Uptime
Track system performance, deployment success rates, and incident trends. Recommend and implement changes that reduce failure rates and recovery times.
Continuously Evaluate and Integrate AI to Improve Performance
Within the first 90–180 days, take ownership of identifying how AI and automation tools can support or enhance the core responsibilities of this role. Evaluate tasks that could be streamlined or improved, lead pilots, and embed continuous AI adoption into daily work.

Senior Systems Engineer

Share This Job