Team Leader - Site Reliability Engineering (Technical Duty Officers)
Company
Xero
Location
Peninsula
Type
Full Time
Job Description
Xero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive.
At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of their lives so that they can help small businesses succeed through better tools, information and connections. Because when they succeed they make a difference, and when millions of small businesses are making a difference, the world is a more beautiful place.
About the team
Xero’s Incident and Problem Management team are a part of the Site Reliability Engineering (SRE) organization and are responsible for the build, delivery and ongoing maintenance of robust process and tooling around Incident management.
The team is responsible for driving enduring reliability at Xero through robust, consistent and fast response to high severity incidents. They are responsible for building a world class process and ensuring that process matures as the demands of the business grows.
About the role
This role requires an experienced SRE professional with a strong technical background, deep experience in SRE, a passion for building and delivering robust processes and extensive experience of leading technical teams who respond to high severity cloud issues.
As a seasoned and relentless professional, they will drive best practice across the business and contribute to the ongoing transformation of the Xero SRE culture. As an expert communicator, you will lead technical discussions to identify and track actions associated with and identified during incident situations.
Across our SRE function, we're looking for those who are keen to deep dive into causes of incidents and proactively examine the potential causes of future incidents; working with engineering teams to remove the risk of that failure scenario. Ultimately building playbooks and automation to ensure quick and effective responses. In addition, provide ongoing training across the business to ensure the process is well understood and adhered to.
This Team Leader role will focus on building and leading a team of highly technical SRE engineers, who provide a Technical Duty Officer (TDO) function within the business. TDO’s are incident commanders who use SRE skillsets to drive fast mitigation and enduring resolution of impactful events.
What you'll do:
- Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero.
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
- Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
What you'll bring:
- Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
- Proven people & team leadership capabilities, having held a Team Leader/Engineering Manager role at a comparative organisation with a passion for leading others
- Hands-on experience troubleshooting AWS hosted services
- Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
- Coding experience (preferably Python) building tools, scripting, or automation
- Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions
Why Xero?
Diversity of people brings diversity of thought, and we like that. Our human-first culture of respect, fairness, and inclusion is what helps Xeros thrive and work and beyond. Offering very generous paid leave to use however you’d like (plus statutory holidays!), dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family, employee resource groups, wellbeing programming and allowances, medical, dental, vision, and disability insurance, fertility and family forming financial support, 401k contribution matching, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices with snacks and break areas, flexible working, career development and many other benefits that reflect our human value, you’ll do the best work of your life at Xero.
Date Posted
01/28/2025
Views
0
Similar Jobs
Support Engineer - Pricefx
Views in the last 30 days - 0
Pricefx a leading SaaS Pricing Price Optimization Management provider is seeking a Tier 34 Support Engineer The role involves providing technical sup...
View DetailsPeople Operations Specialist II - Guardant Health
Views in the last 30 days - 0
Guardant Health a leading precision oncology company is seeking a detailoriented People Operations and Employee Relations Specialist II The role invol...
View DetailsSenior Product Manager - Instrumental
Views in the last 30 days - 0
Instrumental is seeking a Senior Product Manager with extensive experience in enterprise SaaS products or deep domain expertise in electronics manufac...
View DetailsInside Sales & Technical Support Specialist - Gator Bio
Views in the last 30 days - 0
Gator Bio headquartered in Palo Alto CA is a leading developer and manufacturer of BioLayer Interferometry BLI instrumentation and consumable products...
View DetailsSr. Flight Software Engineer (Verification) - Reliable Robotics Corporation
Views in the last 30 days - 0
Reliable Robotics is a team of missiondriven engineers developing safetyenhancing technology for aviation aiming to make air transportation safer more...
View DetailsDistributed Systems Engineer - Kumo
Views in the last 30 days - 0
Kumo is a company building a machine learning platform for data lakehouses enabling data scientists to train powerful Graph Neural Net models directly...
View Details