Job Description:
• Collaborate with Feature Development teams to promote new component versions into production as efficiently as possible.
• Maintain the system to agreed service level and availability objectives using real-time monitoring tools and system-generated metrics.
• Instrumentation of new system metrics and alerts to pre-empt issues and improve performance.
• Respond to monitoring alerts and customer incidents, taking preventative/remedial action to minimise customer impact.
• Liaising with key customer stakeholders to schedule capability changes and capture new service requirements as they arise.
• Apply automation techniques to reduce the burden of manual operations.
Requirements:
• Experience in infrastructure automation tools (CloudFormation, Terraform or Ansible)
• Experience working with Docker containers & container orchestration tools (such as Kubernetes, OpenShift or Docker Swarm)
• Experience using and maintaining CI / CD tools (such as Jenkins or GitHub actions)
• Good understanding of relational databases and SQL
• Linux command line, administration and shell scripting
• Solid understanding of monitoring, auto-scaling, performance tuning, troubleshooting and disaster recovery best practices
• Working knowledge of network security protocols
• Working knowledge of AWS
• Experience with monitoring tools such as InfluxDB, Prometheus or Grafana
Benefits:
• Fully remote
Apply tot his job
Apply To this Job