Site Reliability Engineer – AI, Analytics & Data

H & M Hennes & Mauritz Gbc AB, Stockholm

SQL Python Linux Git Flash Azure

Notera att ansökningsdagen för den här annonsen kan ha passerat. Läs annonsen noggrant innan du går vidare med din ansökan.

Company Description

H&M Group is on a journey to meet and exceed our customers' expectations today and tomorrow. Through collaboration, innovation, and technology we challenge ourselves and the industry. To cater to the individual needs and desires of our millions of customers, our tech organisation delivers solutions for the entire value chain for all our brands.

We are accelerating digitalisation and to stay relevant, we need to ensure we have strong leaders in place to bring our best capabilities, innovation ideas and talented technologists to support the transformation of H&M Group.

We take pride in our history of making fashion accessible to everyone and our ambition for tomorrow is to make fashion even more sustainable, inclusive, and welcoming.

Job Description

Site Reliability Engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations and developing systems and software that helps increase systems reliability and performance. SRE automate redundancy, and they automate manual tasks that they can turn into programmatic tasks to keep the stack up and running. Site Reliability Engineers are able to oversee software and performance of the full technology stack.

At H&M Group, SRE is an area within AI, Analytics and Data (AIAD) Domain. Within our area we have two teams SRE Core and SRE Operations (Ops).

SRE Core works in close collaboration with the Platform and Data team majorly focusing on best practices, frameworks, and automation on a multi cloud tech stack to enable teams to run stable data products.
SRE Ops works closely with the data team and its major focus is on maintaining stability & reliability of products on cloud and on-premises.

Responsibilities:

Infrastructure Automation and Configuration Management:

Develop and maintain automation tools, scripts, and configuration management systems to streamline deployment, provisioning, and monitoring processes.
Implement Infrastructure as Code (IaC) practices using tools like Ansible, Terraform, or Kubernetes to manage infrastructure effectively.
Collaborate with development and operations teams to automate build, test, and deployment processes for efficient software releases.

Reliability Engineering and Resilience:

Design and implement systems and processes to enhance the reliability and resilience of the infrastructure.
Continuously improve system reliability by analyzing incident trends, identifying areas for improvement, and implementing preventative measures.

System Monitoring and Incident Response:

Develop and manage monitoring tools and systems to track the health, performance, security, and availability of software applications, infrastructure components, and services.
Set up alerts, dashboards, and metrics to proactively detect and respond to system outages, service disruptions, and performance incidents.
Investigate and diagnose the root cause of incidents and work towards their resolution in a timely manner.

Continuous Improvement and Collaboration:

Drive a culture of continuous improvement by identifying areas for automation, efficiency, and operational excellence.
Document procedures, incidents, and best practices to facilitate knowledge sharing and improve team efficiency.
Stay abreast of industry trends, emerging technologies, and best practices to propose innovative solutions that enhance system reliability and performance.
Collaborate closely with cross-functional teams, including developers, system administrators, and network engineers, to ensure smooth operation of systems.

Qualifications

Bachelor's degree in computer science, Engineering, or a related field (or equivalent experience) with 3+ years of IT experience
 Proficient in scripting/programming languages such as Python, Bash.
Experience with cloud platforms (Google Cloud Platform & Azure preferred)
Experience in DevOps practice, CI/CD and monitoring tools
Experience with automation tools and configuration management frameworks such as Terraform, Puppet or Ansible
Strong troubleshooting and problem-solving skills with a keen attention to detail
Excellent communication and collaboration skills to work effectively in a cross-functional team environment
Strong experience in system administration, infrastructure management, or site reliability engineering
Ability to thrive in a fast-paced, agile environment and handle multiple priorities

Tech Stack (in a flash):

GCP, Azure, Python, Terraform, Git, SQL, Bash, Power Bi, Grafana, Zabbix, Prometheus, Docker, Kubernetes, Linux, PowerShell, ServiceNow, Dbt, Atlassian

Additional Information

Working with tech at H&M Group

Shaping the future of fashion with people, data, and tech. The fashion and retail industries are going through a transformation, driven by customers' technology and sustainability expectations. At H&M Group, we want to shape the future of fashion and lifestyle by harnessing the power of smart tech and data. With our 74-year history of innovation, we understand the need to collaborate and co-create with engineers and tech specialists around the world to achieve our vision.

What we offer!

You are joining a unique value-driven culture, a large tech network and community where you can be yourself. Besides the obvious perks such as staff discount card, flexible work life, learning communities, wellness benefits, parental benefits etc. There are endless opportunities to experiment and grow in any direction that you want, and when you grow, we grow. Being a major player gives us countless opportunities to make a real impact and shape the future.

We are committed to create an inclusive & diverse workplace with a culture that is dynamic and innovative.

Sounds interesting?

This is a full-time position based in Stockholm. Please apply as soon as possible but no later than 20-September. Considering it's summer-period for everyone, we will screen & review applications 1-Sep onwards only.

We do not accept applications through email due to GDPR

Detta är en jobbannons med titeln "Site Reliability Engineer – AI, Analytics & Data" hos företaget H & M Hennes & Mauritz Gbc AB och publicerades på webbjobb.io den 4 september 2023 klockan 07:58.

Hur du söker jobbet

Klicka här för att ansöka

Jobbfakta

Hitta er nästa webbtalang idag

Sveriges kanske enklaste, snabbaste och billigast sätt att nå de bästa utvecklarna – från endast 499kr!
Inkl. moms.

Publicera

Prenumerera på liknande jobb

Du väljer själv när det är dags att avsluta eller ändra din prenumeration.

Jag godkänner att webbjobb.io behandlar min e-postadress enligt webbjobb.ios villkor.

Dela jobbet

Andra webbjobb i Stockholm

Se alla lediga webbjobb i Stockholm →

Liknande jobb