← All Roles
Posted Jun 10, 2026

[Remote] Senior Site Reliability Engineer — Government & Sovereign Cloud

Note: The job is a remote job and is open to candidates in USA. Veeam Software is the Data and AI Trust Company, specializing in data resilience and security. The role involves building a global Site Reliability Engineering function for the Veeam Data Cloud, focusing on government and sovereign cloud environments, while ensuring high availability and fault tolerance. Responsibilities • Get up to speed on the full platform — all VDC workloads, dependencies, and risk areas. Much of this will happen through code, docs, and conversations rather than direct environment access • Work with SMEs across the org to fill knowledge gaps and build onboarding material for the team • Write and maintain runbooks, architecture docs, and operational guides • Design infrastructure for high availability and fault tolerance on Azure (including Azure Government) • Define SLIs, SLOs, and error budgets where none exist today • Run incident response and blameless postmortems. Turn incidents into improvements • Identify reliability risks across modern and legacy workloads and build practical remediation plans that work within compliance constraints • Close observability gaps — define instrumentation requirements and drive implementation • Set alerting, telemetry, and monitoring standards with partner teams • Build automation to reduce toil and support fleet management • Participate in on-call rotations • Work with IaC, CI/CD, deployment automation, and config management — including in air-gapped or compliance-restricted environments • Build and maintain testing, canary deployment, and release validation pipelines • Integrate chaos engineering and monitoring tools, adapting choices to meet regulatory requirements • Work across product, platform, security, legal, compliance, and operations teams • Own problems end-to-end — identify gaps, drive solutions, don't wait for direction • Mentor other engineers and help spread SRE practices across the org Skills • 7+ years in Software Engineering, with 3+ years in SRE, Platform Engineering, or similar — across multi-service platforms, not just single-service environments • Experience with Government or Sovereign Cloud (e.g., Azure Government, AWS GovCloud) • Experience in regulated compliance environments — government (FedRAMP, CMMC, IL2/IL4/IL5), financial (PCI-DSS, SOX), or healthcare (HIPAA, HITRUST). You understand how compliance shapes architecture and operations • Strong experience building and running production services on cloud infrastructure (Azure preferred, including Azure Government) • Able to learn large, complex platforms quickly with limited guidance — comfortable building understanding from code, docs, and architecture artifacts when direct environment access is restricted • Can investigate systems independently and produce clear docs, risk assessments, and improvement plans • Comfortable working across teams — engineering, product, security, compliance, operations • Programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack) • Experience with IaC (Terraform, Terragrunt, Pulumi) and container orchestration (Kubernetes) • Experience with CI/CD and GitOps tooling — GitHub Actions, Azure DevOps, GitLab CI, ArgoCD, FluxCD, or Dagger • Solid grasp of distributed systems, networking, and cloud-native architecture • Clear written and verbal communication skills • Experience on B2B SaaS platforms in regulated or government markets • Background in chaos engineering, resilience testing, or performance/load testing • Have built an SRE or reliability function from scratch before • Experience across mixed environments — modern cloud-native and older legacy systems • Familiar with AI-first development workflows — using LLM-powered tools for infrastructure automation, code generation, and documentation Benefits • Unlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents • Medical, dental, and vision coverage starting on your first day • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program • 401(k) retirement plan with company matching contributions • Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time • AirVet: 24/7 virtual veterinary care at no cost • Legal services, identity protection, and supplemental health insurance options • Tax-advantaged spending accounts for healthcare, dependent care, and commuting