Posted Jun 17, 2026

[Remote] Director of Site Reliability Engineering

Note: The job is a remote job and is open to candidates in USA. Talently is a cutting-edge organization in the Technology, Information and Media industry, and they are seeking a Director of Site Reliability Engineering. In this role, you will lead and build world-class Site Reliability Engineering practices, driving strategic reliability initiatives and mentoring engineering teams in a remote-first environment.

Responsibilities

Define and execute a comprehensive company-wide Site Reliability Engineering strategy, embedding reliability as a core discipline across engineering teams
Build, lead, and develop a high-performing SRE organization, including hiring, mentoring, and fostering a reliability-focused culture
Establish SLIs, SLOs, KPIs, and error budgets to measure and drive platform reliability and performance improvement
Guide architecture decisions and technical roadmaps for highly available, resilient, and scalable distributed systems
Drive adoption of observability, monitoring, logging, and incident response solutions across cloud-based microservices environments, primarily on Google Cloud Platform
Establish and oversee robust incident response frameworks, operational governance, and post-incident analysis processes
Promote and implement best practices for infrastructure automation, cloud-native operations, and cost optimization
Lead continuous improvement and innovation initiatives, including exploring AI-driven operations and new SRE methodologies

Skills

12+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or DevOps in high-scale environments
5+ years of proven technical leadership, building and scaling SRE teams and practices
Strong expertise with distributed systems, cloud-native infrastructures, microservices, and hands-on Google Cloud Platform experience (GKE, Compute Engine, Cloud Functions)
Deep proficiency with infrastructure as code, automation frameworks, and CI/CD deployment pipelines
Track record designing large-scale observability and monitoring solutions using tools like Prometheus, Grafana, Datadog, or New Relic
Excellent communication, organizational development, and mentorship abilities
Strong programming ability in Python, Go, Java, or similar languages
Cloud or reliability certifications (e.g., Google Cloud Professional, SRE certifications)
Experience implementing AIOps, anomaly detection, predictive analytics, or automated remediation/self-healing infrastructure
Familiarity with AI/ML tools for operational intelligence and intelligent alerting
Strong database performance tuning and distributed data systems knowledge
Comfortable operating in fast-paced, high-growth technology environments
Bachelor's degree in Computer Science, Engineering, or related field

Company Overview

Talently provides nationwide recruitment services, executive search, and career alignment programs. It was founded in 2022, and is headquartered in Newport Beach, California, US, with a workforce of 11-50 employees. Its website is https://www.talently.com/.

Apply for This Position

[Remote] Director of Site Reliability Engineering

Similar Remote Roles