Site Reliability Engineer (SRE)

Versana • New York City, NY

Company

Versana

Location

New York City, NY

Type

Full Time

Job Description

About Us:

Versana is an industry-backed fintech on a mission to make the syndicated loan market better. By digitally capturing agent banks’ data on a real-time basis, Versana provides unprecedented transparency into loan level details and portfolio positions, bringing efficiency and velocity to the entire market. Through our platform, participants can rest assured they are accessing the loan market’s most credible source of deal information.


About You:

Versana is seeking a motivated Site Reliability Engineer (SRE) with strong observability experience to join our growing Platform Engineering squad. The squad’s goal is to manage public cloud, improve DevOps practices, and monitor Versana’s real-time syndicated loan data platform. The ideal candidate will have a deep understanding of cloud-native applications, distributed computing, CI/CD implementation, and observability tools and practices.



Key Responsibilities

  • Design, implement, and maintain observability and event management tools (including self-service tools for application engineers)
  • Monitor system performance, create incident response plans, and implement observability practices to gain insights into system behavior
  • Implement and monitor service-level objectives (SLOs) and indicators
  • Improve system reliability and resiliency
  • Conduct post-incident reviews and implement necessary changes to prevent system failures
  • Assist teams in implementing observability tools and leveraging available telemetry data to troubleshoot and resolve incidents and problems
  • Leverage observability and event management to improve key incident management metrics, such as mean time to detect and mean time to restore services
  • Continually optimize systems and workflows by improving architecture, infrastructure, automation, CI/CD, and observability
  • Collaborate with developers to ensure applications are designed with DevOps best practices in mind
  • Participate in weekend support for cloud infrastructure upgrades and/or releases

Must Haves

  • 5+ years of experience as a Site Reliability Engineer or similar role
  • 3+ years of experience in at least one coding language such as Java, JavaScript, Python, GoLang, or .NET
  • 3+ years of work experience with public cloud (Azure, AWS or GCP)
  • 3+ years of direct experience with observability tools like Datadog, Elasticsearch, and Grafana Labs, etc.
  • 3+ years of experience with containerization and orchestration technologies like Docker and Kubernetes
  • 2+ years of experience in development and management of CI/CD pipelines (e.g., Azure DevOps, Gitlab CI/CD, Github Actions, Jenkins, etc)
  • 2+ years of experience with Infrastructure-as-code tools like Terraform, Azure Bicep, Cloud Formation, etc.
  • 1+ years of experience with site reliability tools like Gremlin, Chaos Mesh, or similar
  • Proven track record leveraging core observability concepts, end-user monitoring, and infrastructure monitoring with SaaS solutions
  • Experience with messaging services like Kafka or Azure Event Hubs
  • Good understanding of the Linux operating system
  • Ability to partner with multi-functional teams and pivot quickly
  • Strong communication, analytical, and problem-solving skills
  • Curiosity and motivation to learn

Nice to Haves

  • Experience in at least one coding language such as Java, JavaScript, Python, GoLang, or .NET.
  • Certifications in cloud technologies
  • Experience with Azure cloud or Azure DevOps
  • Experience with Datadog or similar modern observability tools

Equal Opportunity Employer

We are committed to providing equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.


This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

 

Apply Now

Date Posted

12/17/2024

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Senior Software Engineer, Devices Automation - Block

Views in the last 30 days - 0

Square a company that has evolved since its inception in 2009 is seeking a Software Engineer with extensive experience in embedded devices and test en...

View Details

Senior Data Engineer - Sortly

Views in the last 30 days - 0

Sortly is a successful distributed and remotefirst company offering a multidevice inventory management solution They are seeking a Data Engineer with ...

View Details

FEA Engineer - PhysicsX

Views in the last 30 days - 0

PhysicsX is a deeptech company specializing in machine learning applications for physics simulations They aim to revolutionize design and engineering ...

View Details

Engineer, Quality Assurance – BBU (EQA1) - JMA Wireless

Views in the last 30 days - 0

JMA is a leading company in wireless technology particularly in 5G with its advanced softwarebased platform manufactured in Syracuse NY The companys t...

View Details

AVL Technical Engineer - Life.Church

Views in the last 30 days - 0

The AVL Technical Engineer at LifeChurch is responsible for providing continuous technical support to campus teams resolving technical issues and ensu...

View Details

IT Support Engineer (Contract) - Informa

Views in the last 30 days - 0

Curinos a company with decades of expertise in the financial services industry is seeking an IT Support Engineer for their New York office The role in...

View Details