Senior Site Reliability Engineer, Observability

Box • Remote

Company

Box

Location

Remote

Type

Full Time

Job Description

WHAT IS BOX? 

Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, collaboration and workflow. We have an amazing opportunity to further establish ourselves as leaders in the space, and we need strong advocates to help us achieve that goal. 

By joining Box, you will have the unique opportunity to help capture a majority of this developing market and define what content management looks like for the digital enterprise. Today, Box powers over 98,000 businesses, including 67% of the Fortune 500 who trust Box to manage their content in the cloud. 

WHY BOX NEEDS YOU 

The Observability Platforms team provides an end-to-end experience enabling Box engineers by leveraging frameworks, tools, APIs and visualizations to better understand the behavior of features, services, and infrastructure they own and maintain. The team also helps educate product, infrastructure, and systems teams on how to appropriately monitor features and services they own, provide visualizations for monitoring distributed systems, give guidance for reducing operational overhead, and supports the delivery of unmatched availability to our customers.

We need a Sr. SRE with the experience of having designed, operated, and implemented Observability frameworks at a very large scale, and well versed in the operation of scaled architectures. You should have deep operational knowledge of distributed systems and how to avoid limitations through innovative design.

WHY BOX NEEDS YOU 

The main focus of the Observability Team is to build frameworks and systems that can manage the performance of Box systems while scaling to billions of events per second. Additionally, we are responsible to standardize observability across engineering teams, drive designs for high performing services and foster great observability practices. We build, scale, and operate low-latency, high-throughput data systems that power high resiliency of Box Systems. You will help us execute on this vision and ensure that Box continues to ship scalable services that can hold against the high-performance expectation from our customers.

We are looking for big thinkers and innovators who have experience working with scalable distributed systems and have a passion for high performance and reliability. We are a small team with big ambitions that values impact and is not afraid of huge, gnarly problems. If this excites you, come join us!

WHAT YOU'LL DO 

  • You will work on distributed, high-performance observability data pipeline to collect, transform and route logs, metrics and traces to various storage solution.
  • Implement observability solutions using o11y products like Vector, Big Query, Prometheus, Open telemetry, log stash AppDynamics, Dynatrace, etc. to support centralized logging, APM, monitoring, alerting, and distributed tracing platforms.
  • You will work on various cloud orchestration (terraform) and configuration management (puppet, Ansible) technologies to ensure efficient deployment of observability solution in Kubernetes Clusters in GCP, Bare-Metal and other deployment targets.
  • Manage, maintain and scale the infrastructure responsible for telemetry frameworks used throughout Box's infrastructure, cloud services, and products to capture, transport, store and analyze the telemetry data. Scale the observability infrastructure to support petabytes of logs and billions of metric data points daily.
  • You'll collaborate with other engineers on the team to foster solid engineering principles and represent our engineering values
  • As a senior member of the team, you'll use both technical and relational skills to lead large scale projects to completion
  • You'll collaborate, influence and drive for improvement across scrum teams
  • You'll provide additional support & perform various pocs on new projects, frameworks for Observability
  • Define and educate platform consumers on observability best practices from a SRE perspective.
  • Participate in deep technical design discussions within your team, across partner teams, and ensure that we’re building the right systems

WHO YOU ARE 

  • You take an SRE-centric approach to everything you build/manage, ensuring reliability, availability and security
  • You act like an owner and strive to do work you're proud of, both technically and in your team interactions
  • You are a self-starter and a strong supporter of self service and automation within O11Y (Observability)
  • Deep knowledge of OS system fundamentals (linux) & core internet technologies, including TCP/IP, DNS, NAT, SDN
  • Proven production service troubleshooting skills that span applications, systems and network within a primarily Linux environment
  • Solid understanding of infrastructure automation tools (Puppet, Ansible, or the like)
  • Experience in using industry standard DevOps CI/CD frameworks (Jenkins/Spinnaker, or the like) 
  • Solid experience in building automations, frameworks preferably with Python and Go
  • Experience in running containerized services in Private/Public Cloud (GCP, AWS)
  • Experience in building, managing metrics and data driven observability platforms and peripherals
  • Experience in managing O11y (Observability) is a plus
  • You have a fair understanding of technologies like Elasticsearch, Apache Storm or other DAG technologies, and streaming technologies like Kafka (pub/sub, or Kinesis).
  • You have built distributed, high-throughput and low-latency systems with a strong focus on availability, resilience, and durability.
  • Remote Friendly

BENEFITS

  • Visit this webpage to check out all of our exciting healthcare benefits: https://join.collectivehealth.com/box
  • For all other benefits, please check out: Box Benefits + Perks 

EQUAL OPPORTUNITY 

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

For details on how we protect your information when you apply, please see our Personnel Privacy Notice.

HEALTH AND SAFETY
 
To promote the health and safety of all Boxers and our communities, in order to "Go to Work" at Box in the U.S., you must be Fully Vaccinated or have an approved accommodation. "Go(ing) to Work" at Box is defined as visiting a Box office, facility, or co-working site, visiting or meeting in person with fellow Boxers, Box clients and/or customers, vendors, or partners, engaging in business travel, and or participating in any Box-sponsored and/or related activity where others are present.  If you are fully remote and do not "Go to Work,” the vaccination requirement is not applicable.  "Fully Vaccinated" means that an individual is at least two weeks past their final dose of an authorized COVID-19 vaccine regimen.  If you are unable to get a vaccine due to a medical condition, a sincerely-held religious belief or another legally recognized reason, Box will consider requests for an accommodation.
 
For details on how we protect your information when you apply, please see our Personnel Privacy Notice.
 
Notice to applicants in San Francisco:  Box, Inc and its related branches will consider for employment, qualified applicants with criminal histories in a manner consistent with the San Francisco Fair Chair Ordinance.  The Fair Chance Ordinance is provided here. 

#LI-JA1

Apply Now

Date Posted

09/13/2022

Views

5

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Linux Support Engineer - Voltage Park

Views in the last 30 days - 0

Voltage Park is seeking a Linux Support Engineer for a fulltime remote position The ideal candidate will have command line level Linux sys administrat...

View Details

Technical Architect - CDW

Views in the last 30 days - 0

CDW offers a rewarding career opportunity for a Technical Architect with expertise in ServiceNow The role involves delighting customers by collaborati...

View Details

Senior React.js & Python Developer - Lemon.io

Views in the last 30 days - 0

Lemonio is a marketplace that connects Senior Developers with handpicked startups in the US and Europe They offer projects based on the developers exp...

View Details

Federal Security Solutions Engineer - Rapid7

Views in the last 30 days - 0

Rapid7 is seeking a Federal Solutions Engineer with 5 years of experience in cybersecurity solutions engineering or technical sales focusing on federa...

View Details

Manager, ABM - Chronosphere

Views in the last 30 days - 0

Chronosphere is seeking a datadriven ABM Manager with 7 years of marketing experience particularly in B2B SaaS with technical audiences and complex en...

View Details

Sales Engineer - Dandy

Views in the last 30 days - 0

Dandy a venturebacked company is revolutionizing the 200B dental industry with advanced technology They are looking for a Sales Engineer with 5 years ...

View Details