← All Roles
Posted May 21, 2026

Senior DevOps Engineer, Infrastructure & Reliability

Description: • Conduct interviews with engineering teams to identify and remove operational friction in CI/CD, deployments, observability, and cloud environments. • Design and implement scalable infrastructure-as-code patterns using Terraform to standardize provisioning and reduce configuration drift. • Own and evolve the Kubernetes platform, including EKS or self-managed environments, so workloads are secure, scalable, and resilient. • Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase release confidence. • Lead reliability initiatives such as incident response improvements, root cause analysis, and postmortem practices. • Design and enforce secure networking, IAM, and secrets management strategies across environments. • Improve observability through metrics, logs, and tracing using DataDog or similar tooling. • Optimize cloud costs through rightsizing, autoscaling, and architectural improvements. • Own disaster recovery planning, backup strategies, and multi-region resilience initiatives. • Refactor manual or brittle infrastructure into automated, testable, reproducible systems and drive adoption through documentation and hands-on support. Requirements: • 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles. • Proven experience designing and operating production Kubernetes environments at scale. • Deep hands-on expertise with AWS infrastructure and cloud networking. • Strong experience building and maintaining Terraform modules across large cloud environments. • Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics. • Experience leading incident response processes and driving meaningful postmortem outcomes. • Strong understanding of distributed systems, event-driven architectures with Kafka, and database performance with PostgreSQL. • Proven ability to modernize legacy infrastructure and eliminate manual operational toil. • Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps. • Nice to have: experience operating high-throughput Kafka clusters, tuning PostgreSQL or Redis, implementing autoscaling, building internal developer platforms, applying security best practices, working with multi-region systems, using Python for automation, or introducing SLO/error budget/chaos testing frameworks. • All remote hires must be able to travel to Orlando, Florida at least twice per year, plus for orientation in Orlando. Benefits: • Health care plan including medical, dental, and vision coverage. • Retirement plan with 401(k) and IRA options. • Life insurance. • Flexible vacation. • Work-from-home option. • Wellness resources. • Free food and snacks in the office. • Hybrid setup in Orlando, Florida.