Senior Site Reliability Engineer
Company
Guidewire Software
Location
Greater Denver Area
Type
Full Time
Job Description
Guidewire is searching for a Sr. Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry’s leading Analytics platform. As a member of the SRE-Analytics Team, you’ll be responsible for building and evolving our SRE practice for Analytics. The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, generative automated intelligence (Gen AI), and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions. This role is a great opportunity for individuals motivated by learning cutting edge technologies and their application to solve real world business problems. Guidewire is the AWS for insurance companies that use our platforms and applications. The solutions developed by you and this team will be used by hundreds of insurance companies and impact billions of dollars in annual transactions
Downtime and failures are inevitable, but how SREs deal with the problem is what’s important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments. Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.
When on-call, you will be responsible for:
- Responding to any critical incidents and ticket escalations.
- Following and documenting our post incident response/post mortem processes.
- Executing planned patching or improving related automation Engineering to reduce toil, tune alerts, and improve documentation
When NOT on-call, you will be responsible for:
- Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems.
- Analyzing our AWS infrastructure and related applications/services for design and architectural opportunities to improve overall reliability and cost intelligence.
- Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved.
- Analyzing incident data to determine the next opportunity to improve reliability.
- Influencing engineers to improve application reliability and scalability to run efficiently.
- Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation.
- Improve operational processes (such as deployments and upgrades) to make them as boring as possible
Required Skills:
- Proven experience triaging and debugging distributed systems on cloud infrastructure Proven experience in designing and engineering CI/CD pipelines within K8S and legacy ecosystems.
- Experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches.
- Experience in designing and engineering monitors, dashboards, and synthetic testing.
- Experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible.
- Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale.
- Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent.
- Good verbal and written communication skills
Preferred Skills
- SRE Certified in multiple categories.
- AWS Certified in multiple categories.
- Experience with Datadog Cloud Monitoring.
- Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design.
- Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions.
- Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
Date Posted
05/11/2024
Views
11
Similar Jobs
Senior Electrical Engineer - Red 6
Views in the last 30 days - 0
Red 6 is a pioneering AR technology startup specializing in synthetic air combat training The company is seeking a Senior Electrical Engineer to contr...
View DetailsGrowth Marketing Specialist - B2C - MakeMusic - Peaksware
Views in the last 30 days - 0
The Growth Marketing Specialist position at Peaksware which includes brands like TrainingPeaks MakeMusic TrainHeroic and Alfred Music is a key role in...
View DetailsBI Analyst II - Spectrum
Views in the last 30 days - 0
Spectrum is seeking a BI Analyst II to join their Business Intelligence team The role involves working closely with stakeholders across various depart...
View DetailsBusiness Development Representative - MakeMusic - Peaksware
Views in the last 30 days - 0
Peaksware a company that includes brands like TrainingPeaks MakeMusic TrainHeroic and Alfred Music is seeking a Business Development Representative Th...
View DetailsRecruiter - Peaksware - Peaksware
Views in the last 30 days - 0
Peaksware which includes brands like TrainingPeaks MakeMusic TrainHeroic and Alfred Music is seeking a Recruiter for a hybrid role The ideal candidate...
View DetailsGrowth Marketing Specialist - B2B - MakeMusic - Peaksware
Views in the last 30 days - 0
The Growth Marketing Specialist role at Peaksware which includes brands like TrainingPeaks MakeMusic TrainHeroic and Alfred Music is a key position in...
View Details