• Notera att ansökningsdagen för den här annonsen kan ha passerat. Läs annonsen noggrant innan du går vidare med din ansökan.

Episerver Engineering Operations is a rapidly growing part within the organization. We are in the process of building our teams, tools and systems as part of our mission to build the leading digital experience platform.We enable Episerver to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values (Dependable, Collaborative and Simple) with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.As an SRE in one of our teams, you will work to enhance availability, performance and stability of Episerver services as well as automating away repetitive work.You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation.The RoleServe as level 3 support resource for responsible systemsTroubleshoot and resolve end-user issues independently and efficientlyBuild knowledge base around common production support issuesTroubleshoot and fix the system when it breaksReduce the impact of errors and automate repetitive tasksMaintain services once they are live by measuring and monitoring availability, latency and overall system healthAuthor and maintain documentation for related processes, procedures and system eventsIdentify areas of improvement within our systems and perform enhancementsShare the responsibility of being on-callEngage in the entire lifecycle of services—from inception through operation and continuous integrationLead incident triage, analysis, and resolutionDrive Root cause analysis and corrective action completion to help eliminate disruption of services and consequently to improve the day-to-day operations of the organization     Essential RequirementsExpert level troubleshooting skills across different levels of the stackScripting and software development across one or more programming languages (Powershell / Bash / Python)Deep understanding of cloud architecture and Linux based systemsHands on experience with cloud infrastructure such as Azure or AWS minimum of 2 yearsDeep expertise in monitoring distributed systems application architecturesExposure to and maintenance of configuration management and orchestration tools at scale (Azure Automation, Salt, Puppet, Chef etc.)Diagnosing and troubleshooting user facing service outagesExposure to system and application level telemetry for large distributed cloud architecturesDiagnosing and resolving problems in high-throughput web applications and network servicesWe would be very excited if you have experience with:·      ElasticSearch·      Understanding of ITIL terminology for incident and problem management·      Experience in SaltStack

Detta är en jobbannons med titeln "Site Reliability Engineer" hos företaget EPiServer AB och publicerades på webbjobb.io den 9 april 2019 klockan 00:00.

Hur du söker jobbet

webbjobb-logo-white webbjobb-logo-grey webbjobb-logo-black