• Notera att ansökningsdagen för den här annonsen kan ha passerat. Läs annonsen noggrant innan du går vidare med din ansökan.

We are EA

And we make games – how cool is that? In fact, we entertain millions of people across the globe 24/7 with the most amazing and immersive interactive software in the industry. But making games and delivering a flawless player experience is hard work. That’s why we employ the most creative, resourceful, and passionate people in the industry.

The Challenge Ahead

We are a group of Site Reliability Engineers who collaborate with multiple teams to provide online services that enhance the game experience. We support a multi-billion-dollar video game ecosystem and various non-development business units within EA – our portfolio is wide. Our environments are continuously challenged by marketing promotions, game launches, and security threats. We are passionate about automation and ensuring high standards.

Who You Are

  • A self-starter with a considerable breadth of technical knowledge and the ability to dig deep
  • Someone who communicates well with people across dozens of teams and practices
  • An engineer with a passion for excellence, a devotion to automation, and an eye for efficiency
  • An engineer with development experience who has improved operations with code
  • A curious problem solver who isn’t afraid to get dirty

Who We Are

We are a multi-discipline team of engineers supporting our live services and the developers who create them. As Site Reliability Engineers our role covers the entire life-cycle of a product, from helping the developers with architecture and delivery to on-call incident response and triage. We focus heavily on automation and continuous integration/delivery with an emphasis on solving operations issues using software, ensuring that everything we deliver is robust, efficient, and supportable. Our responsibilities include:

  • Using code to solve common operational problems in a results-focused way
  • Establishing monitoring, alerting, and dashboarding to continuously improve the observability of player experience, infrastructure and application performance, and business metrics.
  • Hands-on design, analysis, development, and troubleshooting of highly-distributed large-scale production systems spanning on-prem and cloud-based hosting
  • Performing root cause analysis and post-mortems with an eye towards future prevention
  • Being the escalation path for on-call incident response and triage
  • Using automation technologies to ensure repeatability, eliminating toil, reducing mean time to detection and resolution (MTTD & MTTR) and repair services
  • Using scale testing to measure, tune and optimize system performance
  • Designing and implementing CI/CD and app deployment solutions for anything we or our dev teams build
  • Preemptively creating stability, security, and performance improvements
  • Making sure every service is tuned for high-availability and disaster recovery
  • Maintaining security standards across everything we support
  • Producing documentation, runbooks, and support tooling for online support teams

Your Skills

The systems we support are incredibly diverse, produced by dozens of teams from around the world. The ideal candidate will have a diverse skillset and always be eager to expand it. More importantly, they will be able to apply their conceptual understanding to new technologies and tools rapidly. Being a self-starter and having a personal dedication to continuous learning is key. The below is a list of skills we are looking for, in addition to those the successful candidate brings:

  • Leading the example of engineering quality with testing, teaching and a team-minded attitude
  • Cross functional knowledge with system, storage, networking, security and databases
  • Experience in monitoring infrastructure and application availability and reliability to ensure SLI and SLO
  • A strong understanding of *nix is mandatory; familiarity with RHEL and Debian is preferred
  • Understanding of standard networking protocols and components such as HTTP, DNS, ECMP, TCP/IP, UDP, ICMP, the OSI Model, subnetting, and load balancing strategies.
  • Automation and orchestration: Chef, Puppet, Terraform, Packer, Jenkins
  • Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, C/C++; strong skills in reading, understanding, and writing code in the same
  • A strong understanding of distributed systems is a must
  • An understanding of the CAP theorem, Microservices, Twelve-Factor Apps, and techniques for high availability, service discovery, secret management, etc.
  • Virtualization, containerization, and cloud computing: AWS (preferred), GCP, Azure, VMWare ecosystem, Kubernetes (preferred), Docker, Vagrant, etc.

What's in it for you? Glad you asked!

We love to brag about our great perks like comprehensive health and benefit packages, tuition reimbursement, 401k with company match, and, of course, free video games. And since we realize it takes world-class people to make world-class games, we offer competitive compensation packages and a culture that thrives off of creativity and individuality. At EA, we live the “work hard/play hard” credo every day.

Detta är en jobbannons med titeln "Site reliability engineer" hos företaget Ea digital illusions ce ab och publicerades på webbjobb.io den 16 september 2020 klockan 17:14.

Hur du söker jobbet

webbjobb-logo-white webbjobb-logo-grey webbjobb-logo-black