Site Reliability Engineer

Graylog, Inc • North America

Company

Graylog, Inc

Location

North America

Type

Full Time

Job Description

Graylog : Empowering Threat Detection Investigation & Response Solutions with Cutting-Edge Technology

Graylog specializes in delivering top-notch Threat Detection Investigation & Response (TDIR) solutions backed by our latest addition the Graylog API security platform. As a renowned centralized log management (CLM) and Security Information Event Management (SIEM) provider we offer unparalleled fast and efficient log analysis capabilities in critical areas such as security compliance operations and DevOps.

Our enterprise solution enables organizations globally to capture store and analyze terabytes of machine data in near-real time while our open-source product has been deployed in more than 50000 installations worldwide empowering individuals and small teams to perform basic log consolidation analysis and search functions at no cost.

We're a remote-friendly company with locations in Hamburg Munich London Boulder and headquarters in Houston TX. If you live near an office and want to be part of said office great.  Nearish to an office and want to have the ability to hot desk? No problem and if you're not near an office and wish to work remotely all good!

Recent achievements for Graylog have been inclusion in the 2021 Deloitte Technology Fast 500™ we took home two of the most prestigious cybersecurity awards in SIEM and DevSecOps from Cyber Defence Magazine at RSA in 2023 and 2024 has seen us take home gold and become the Globee Winner for Security Information & Event Management and the 2024 Globee Winner for Threat Hunting Detection Intelligence and Response.

Graylog has recently been named a “Leader” and “Fast Mover” in GigaOM’s 2024 Radar Report for SIEM.

Who we’re looking for;

We’re currently recruiting for a Site Reliability Engineer to join our multinational cloud services team.

As a Site Reliability Engineer here at Graylog you will provide architectural guidance and technical solutions for adapting our product in a 24x7 support cloud offering with a focus on delivering a product that is highly available resilient secure scalable cost-efficient and consistently delivers valuable product outcomes to consumers.

Our Site Reliability Engineers work with state-of-the-art technologies as we ensure you have the right tools to make a significant impact in managing our systems and to drive their continuous improvement while shaping the future of our cloud strategy.

We believe that the best ideas can come from anywhere and we value your input and initiative. Here you will not just be a guardian of our infrastructure; you’ll be an innovator a problem-solver and a leader.

This role is a full-time permanent position based in North America and will report to our Engineering Manager Site Reliability.

Additional responsibilities will include but are not limited to;

  • Cloud Infrastructure Management: Writing pull requests (PRs) to make changes that improve and optimize our AWS+Terraform+Kubernetes setup centring around ensuring its high availability scalability and resilience.

  • Security & Compliance: Implementing security measures auditing the cloud environment and ensuring adherence to compliance standards.

  • Tool Development: Expanding our internal tool base focusing on Infrastructure as a Code and configuration management improvements.

  • Issue Resolution: Collaborating with teams to identify and resolve infrastructure-related issues swiftly minimizing any impact on product performance.

  • Cloud Strategy Advocacy: Championing cloud strategies that align with and advance our business objectives especially during pitch cycles and other planning meetings.

  • Knowledge Sharing: Connecting with Cloud Engineers Site Reliability Engineers and application engineers documenting key decisions where possible and making sure critical knowledge isn't siloed in a single spot in the organization.

What you can expect your first 12 months will look like;

  • Infrastructure Knowledge: Within six months acquire expert understanding of and submit an approved peer-reviewed pull request (APRPR) for each of the following technologies: Terraform Flux Kustomize and Argo.

  • Stability Improvements: In the first 6-9 months deliver a POC for a technology improvement centred around improving or maintaining uptime reducing the reliance on single points of failure or reducing the Time to Recovery after an incident.

  • Signal and Metrics Improvement: Within six months contribute to at least one cycle of signal and metrics improvement and show that the overall number of alerts decreased in the following cycle and/or a requested metric or set of metrics has been made available for use.

  • Security and Compliance: In the first 12 months contribute to at least one of the following: AWS Product and Architecture Review SOC 2 compliance review Disaster Recovery (DR) plan review and drill Security Penetration Test (Pen Test) review and remediation.

Little bit about you;

  • Cloud Infrastructure Management: Proficiency in managing cloud infrastructures especially AWS along with associated tools like Terraform and Kubernetes ensuring high availability scalability and resilience.

  • Experience with Infrastructure as Code (IaC): Hands-on experience with IaC tools and techniques including configuration management and cloud provisioning.

  • Software Development: Basic programming skills in at least one language such as Python for tool development and automation tasks.

  • Security Best Practices: Knowledge of security protocols and compliance requirements specific to cloud environments with experience in implementing security measures.

  • Troubleshooting & Issue Resolution: Experience in diagnosing and resolving infrastructure-related issues working closely with development and support teams.

  • Monitoring and Metrics: Familiarity with cloud monitoring tools and performance metrics to continuously evaluate and improve the infrastructure.

  • CI/CD Practices: Understanding of continuous integration and continuous deployment practices for efficient and reliable product releases.

  • Documentation & Communication: Ability to document technical processes clearly and effectively communicate architectural decisions and changes to various stakeholders.

Just some of the reasons why to join Graylog;

  • Management team with deep programming technical and product experience.

  • Opportunity to work with a globally distributed and diverse team.

  • Grow and develop professionally and personally in a fast-growing environment.

  • Choice of the latest equipment to help you succeed.

  • Monthly allowance to support your commute costs and support outfitting your work-from-home environment.

Here at Graylog you'll find a diverse group of experienced professionals who love to have fun while meeting the needs of our customers with the best solution and customer service available.

Our values;

Openness - As a global company we encourage our people to bring their backgrounds ideas and perspectives to our collective work. We lead with integrity and are committed to doing what is best for the Graylog community.

Collaboration - Through mutual respect trust and candid communication across all teams we deliver the best ideas and results.

Useful Innovation - We take calculated risks to find new ways to innovate. By continuously improving ourselves processes and technologies we deliver the best solution for our customers.

Ownership - As owners we take the initiative to solve internal and external problems while supporting peer success and holding ourselves accountable for delivering the best work. We do this from a place of high trust.

Do the Right Thing! - Comfort and safety come from knowing that everyone will do the right thing even when nobody's looking.

For further information please submit an application and a member of the Graylog People Team will be in touch.

Apply Now

Date Posted

12/18/2024

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior DevOps Engineer - Lemon.io

Views in the last 30 days - 0

Lemonio is a marketplace that connects Senior DevOps engineers with startups in the US and Europe They offer a monthly salary of 4k79k depending on ex...

View Details

Product Support Engineer - SPHERE

Views in the last 30 days - 0

SPHERE Technology Solutions is hiring a Product Support Engineer to provide technical support and guidance to clients and product stakeholders The rol...

View Details

C++ and JUCE Audio Developer - Art+Logic

Views in the last 30 days - 0

ArtLogic a custom software development company founded in 1991 is seeking a Software Audio Engineer for longterm projects The ideal candidate should h...

View Details

Assistant Project Manager (00392) - PMA Consultants

Views in the last 30 days - 0

PMA is seeking an Assistant Project Manager with a Bachelors degree in engineering construction management or a related field The role involves managi...

View Details

Senior Data Scientist - Data Products (LLMs) - Wealthsimple

Views in the last 30 days - 0

Wealthsimple a leading Canadian fintech company is seeking a Data Scientist with expertise in Natural Language Processing Reinforcement Learning and L...

View Details

Engagement Manager - RethinkEd - Rethink

Views in the last 30 days - 0

Rethink Ed an educational technology company founded in 2007 is transforming learning and development in schools and beyond The Engagement Manager a k...

View Details