• Notera att ansökningsdagen för den här annonsen kan ha passerat. Läs annonsen noggrant innan du går vidare med din ansökan.

Site Reliability Engineers are responsible for ensuring that services are available, the underlying infrastructure is properly functioning, and other internal tools, processes, and systems are working as expected. An essential responsibility also includes monitoring critical applications and related services to ensure availability during critical business hours

Main responsibilities, activities and duties

Thoroughly analyse assigned systems and supported business to understand design and functionality in relation to stakeholder needs.

Identify, establish, and uphold appropriate system Service Level Objectives (SLOs) together with the team and implement the use of error budgets along with policies/consequences for deviating from them.

Ensure observability and monitoring of relevant systems. Provide guidelines and educate the product teams on observability practices/standards around metrics, logs and traces.

Proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, watching trends, and using Chaos Engineering.

Build automated solutions and tools to help debug and resolve problems in production and prevent them from reoccurring.
Lead blameless post-mortems for incidents together with different product teams and vendors. Be part of on-call rotations to ensure SLOs are met, with the goal of eliminating the need for support outside of office hours.

Education and certification
Academic degree in systems development, or equivalent knowledge and skills acquired through work experience and continuing professional education.

Knowledge and experience

- Senior level expertise in: - Software development.
- Database management systems and SQL.
- Containerization with OpenShift/Kubernetes.
- Integration though KAFKA and message queues.
- Java
- Cloud (Azure/AWS)

- Proven experience in developing production-grade, performant, scalable and durable applications.
- Experience at all levels of the technology stack, i.e. Infrastructure, Database, API, components and front-end.
- Hands on experience of managing complex, high-volume applications/components in production critical environments.
- Experience of performance tuning techniques, stability patterns and scalability approaches.
- Experience of fact-based, data-driven problem solving and communication.

Other qualifications

- Excellent analytical skills.
- Good leadership qualities.
- Excellent in planning and working in a very structured way.

Proven ability to independently capture and share information through formal written
documentation in English and local language.
Fluent in English and Swedish, both spoken and written.
Workplace : HQ Solna
Process oriented mind-set and a strong ability to follow methods to secure cross functional efficiency and collaboration.
Previous work experience of Site Reliability Engineering is meritorious.

Detta är en jobbannons med titeln "Site Reliability Engineer" hos företaget W.IT.G Consulting AB och publicerades på webbjobb.io den 26 augusti 2022 klockan 17:27.

Hur du söker jobbet

webbjobb-logo-white webbjobb-logo-grey webbjobb-logo-black