Senior Site Reliability Engineer – Google Distributed Cloud Edge (Edge SRE)

Chicago, IL · Information Technology

Senior Site Reliability Engineer – Google Distributed Cloud Edge (Edge SRE)
Location: Hybrid – Chicago, IL (preferred)
Employment Type: W2, Contract to Hire, Direct Hire

Overview
Our client is seeking a highly skilled Edge Site Reliability Engineer (Edge SRE) to lead the design, automation, and operations of Google Distributed Cloud Edge (GDCE) environments. This role combines deep expertise in cloud-native platforms, networking, and automation with a strong focus on performance, reliability, and scalability at the edge. The ideal candidate thrives in complex, distributed systems and is experienced at bridging platform engineering with application, network, and security teams.

Key Responsibilities

Define and implement automation principles for edge compute provisioning and application deployments.
Design and optimize intelligent caching, traffic steering, and edge routing strategies to reduce latency and maximize performance.
Collaborate with security teams to implement DDoS protection, bot mitigation, and secure TLS termination policies.
Develop monitoring, alerting, and observability frameworks (real-time + historical) for latency, traffic, and system health.
Lead incident response efforts, including root cause analysis and blameless post-mortems.
Partner with application, network, security, and platform teams to ensure edge systems integrate seamlessly with core infrastructure.
Drive automation initiatives to reduce operational toil and improve system efficiency.
Build tools, dashboards, and data pipelines to monitor service performance and identify bottlenecks.
Establish and champion KPIs that measure reliability, scalability, and overall success of the edge practice.
Serve as a thought leader in advancing edge operations aligned with business goals.

Skills & Qualifications

Strong expertise in networking fundamentals (TCP/IP, BGP, DNS) and carrier-grade environments.
Hands-on experience with Kubernetes administration, CI/CD pipelines, and Infrastructure as Code (Terraform).
Proven background in GCP (preferred) and/or AWS cloud infrastructure.
Deep experience in monitoring/observability tools (Prometheus, Grafana, ELK, New Relic, etc.).
Demonstrated success driving automation and tooling initiatives to improve reliability and reduce toil.
Prior experience guiding or leading Operations/SRE teams in large-scale, multinational environments.
Exceptional communication, leadership, and cross-functional collaboration skills.

Senior Site Reliability Engineer – Google Distributed Cloud Edge (Edge SRE)

Share This Job