Senior Engineer - SRE Engineer

OKX • Other US Location

Company

OKX

Location

Other US Location

Type

Full Time

Job Description

Who We Are

At OKX, we believe our future is reshaped with technology. Founded in 2017, OKX is one of the world’s leading cryptocurrency spot and derivatives exchanges. OKX innovatively adopted blockchain technology to reshape the financial ecosystem by offering some of the most diverse and sophisticated products, solutions, and trading tools on the market. Trusted by more than 20 million users in over 180 regions globally, OKX strives to provide an engaging platform that empowers every individual to explore the world of crypto. In addition to its world-class DeFi exchange, OKX serves its users with OKX Insights, a research arm that is at the cutting edge of the latest trends in the cryptocurrency industry. With its extensive range of crypto products and services, and unwavering commitment to innovation, OKX vision is a world of financial access backed by blockchain and the power of decentralized finance.

The Service Reliability Engineering team envisions ensuring service stability as one of the company's core competitive advantages. By building end-to-end, chain-level risk management capabilities, we aim to achieve sustainable, automated identification and analysis of stability risks, transitioning from "reactive governance" to "proactive governance". This approach allows us to preemptively address more stability issues, improving user experience.


What You’ll Be Doing: 

  • - Ensure stability and optimize big data platforms (Alibaba Cloud DataWorks, AWS EMR, AWS DataBricks, Spark, Flink) and data warehouses (MaxCompute, Hologres, Hive, Clickhouse, StarRocks, etc.).
  • Deeply understand the architecture and principles of middleware (Kafka, Spring Cloud, Nacos, Apollo, Kong Gateway, etc.), ensuring high performance and availability.
  • Effectively optimize existing runtime environments (KVM, Docker, K8S, JVM, etc.) to ensure efficient resource utilization and stable service operation.
  • Comprehend network architecture and security, providing guidance on infrastructure stability based on network architecture and security layers, ensuring secure, stable, and efficient network communications.
  • Lead chaos engineering exercises, coordinating with business units to validate system robustness and recovery capabilities through simulated failure scenarios.
  • Participate in rapid response and troubleshooting of system failures, continuously optimize monitoring strategies to reduce system downtime and ensure service continuity and stability.
  • Drive infrastructure automation and intelligence to improve SRE work efficiency and quality.
  • Collaborate closely with development teams, providing technical support and advice on infrastructure to jointly promote continuous product improvement and innovation.


What We Look For In You: 

  • Bachelor's degree or above in Computer Science or related field, with 8+ years of experience in large-scale internet or cloud computing platform development/SRE/operations.
  • In-depth understanding of big data platforms, data warehouses, middleware, runtime environments, and network technology principles and architectures, with rich practical experience and troubleshooting skills.
  • Proficient in Linux system management and optimization, familiar with scripting languages such as Shell/Python, able to write automation tools and scripts.
  • Familiar with container and cloud-native technologies like KVM, Docker, and K8S, including their architectures and principles, with extensive experience in handling common issues and failures.
  • Familiar with network protocols such as TCP/UDP/QUIC, proficient in using network commands like TcpDump, TraceRoute, Netstat, and tools like Wireshark, with rich practical experience in troubleshooting common network issues.
  • Rich experience with Alibaba Cloud and AWS cloud products, from architecture to usage, with extensive practice in dealing with common issues and failures.
  • Practitioners with experience in service governance system construction, architecture optimization, stability assurance construction, capacity management, activity support, and chaos engineering are preferred.
  • Strong sense of responsibility and team spirit, with excellent problem-solving and analytical skills.
  • Must have Chinese communication skills; proficiency in both Chinese and English communication is preferred.

 

 Perks & Benefits 

  • Competitive total compensation package

  • L&D programs and Education subsidy for employees' growth and development

  • Various team building programs and company events

  • Wellness and meal allowances 

  • Comprehensive healthcare schemes for employees and dependants 

  • More that we love to tell you along the process!


#LI-KARL




Apply Now

Date Posted

09/23/2024

Views

1

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior Engineering Manager, Micros Foundations - Atlassian

Views in the last 30 days - 0

Atlassian is seeking a Senior Engineering Manager to lead a team of Backend Software Engineers The role involves guiding technical decisions prioritiz...

View Details

Senior Frontend Engineer - Simply Business

Views in the last 30 days - 0

Simply Business is seeking a Senior Frontend Engineer to join their Front End Tooling team The role involves developing products using best practices ...

View Details

Senior Professional Services Consultant - Cloudflare

Views in the last 30 days - 0

The role of a Professional Services Consultant for Application Security and Performance at Cloudflare involves providing advisory and handson keyboard...

View Details

Senior Software Engineer (Scala/Java) - HERE Technologies

Views in the last 30 days - 0

HERE Technologies is seeking an experienced backend engineer with strong Java or Scala skills to join the Map Processing Pipelines team The role invol...

View Details

Senior Product Analyst - FinCrime Platform - WISE

Views in the last 30 days - 0

Wise is seeking a Senior Product Analyst for its FinCrime Platform The role involves driving analytics efforts in the Financial Crime Platform product...

View Details

Senior Data Analyst - Customer Experience - WISE

Views in the last 30 days - 0

Wise is a global technology company aiming to revolutionize international money transfers by offering minimal fees maximum ease and full speed They ar...

View Details