Platform Infrastructure Engineer

Arcee AI • Remote

Company

Arcee AI

Location

Remote

Type

Full Time

Job Description

About Us:
Arcee.ai is a cutting-edge AI company that empowers enterprises to own their GenAI strategy. We're a team of passionate and innovative engineers, researchers, and industry experts dedicated to pushing the boundaries of AI technology. We're looking for an exceptional Solution Architect to join our team and help design, develop, and deploy AI-powered solutions that meet the highest standards of quality, reliability, and performance.


About the role:

We’re looking for a Platform Infrastructure Engineer with a deep focus on Kubernetes and AWS EKS to build and scale our multi-tenant, multi-cluster infrastructure that hosts our SAAS products, enterprise products, and AI models. In this role, you’ll collaborate closely with a small, agile team to automate infrastructure provisioning, streamline deployment pipelines, and ensure the reliability and scalability of our platform. You’ll leverage tools like ArgoCD, Atlantis, Terraform, Terragrunt, Grafana observability stack, and work with deploying and orchestrating GPUs to drive a GitOps-first approach and cultivate operational excellence. 

‍

What you’ll do:

  • Architect, deploy, and maintain Kubernetes clusters on AWS EKS in a multi-tenant, multi-cluster environment that is portable to other cloud providers and VPCs.
  • Own our Infrastructure as Code practices using Terraform and Terragrunt, ensuring consistency and repeatability
  • Implement and manage GitOps workflows with ArgoCD to enhance delivery pipelines
  • Set up, configure, and maintain Atlantis for automated Terraform workflow management
  • Collaborate with developers, DevOps, and product teams to improve deployment speeds and system reliability
  • Take part in writing and reviewing technical documentation, providing best practices and guidance for the broader engineering team
  • Troubleshoot and resolve issues across infrastructure and networking.
  • Help deploy, orchestrate, and monitor our GPUs


What we’re seeking:

  • Experience deploying and orchestrating a Grafana Observability Stack (Alloy, Mimir, Loki, Tempo, Grafana) or similar monitoring solution.
  • Experience deploying and orchestrating GPUs.
  • Proven experience with Kubernetes in production, with readiness to tackle multi-cloud.
  • Hands-on expertise with Terraform and Terragrunt for Infrastructure as Code
  • Familiarity with GitOps methodologies and ArgoCD for continuous deployment
  • Experience managing multi-tenant, multi-cluster environments at scale
  • Strong scripting and automation skills (e.g., Python, Bash, Go)
  • Solid understanding of networking concepts and cloud infrastructure (AWS preferred, other cloud providers acceptable)
  • Clear communication, problem-solving mindset, and the ability to work effectively in a small, fast-moving team 

‍

Equal Opportunity

We are an Equal Opportunity Employer, offering equal opportunity to all regardless of race, religion, gender identity, sexual orientation, age, citizenship, marital status, disability, and more. We would like to remind candidates that the listed qualifications for each role are not hard requirements, and we encourage them to apply if they feel they would be a good fit.

‍

Compensation

We offer competitive salaries, equity, and benefits. We base our salaries on location, role, and level as well as consideration of the candidate’s experience and overall qualifications.

‍

Apply Now

Date Posted

01/24/2025

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Linux Support Engineer - Voltage Park

Views in the last 30 days - 0

Voltage Park is seeking a Linux Support Engineer for a fulltime remote position The ideal candidate will have command line level Linux sys administrat...

View Details

Data Analyst - Agero

Views in the last 30 days - 0

Agero a leading B2B whitelabel provider of digital driver assistance services is revolutionizing the vehicle ownership experience through datadriven t...

View Details

Director, Product (Remote) - Dscout

Views in the last 30 days - 0

Dscout is a leading company in experience research technology offering a platform for major companies to gain insights into user needs and behaviors T...

View Details

Technical Architect - CDW

Views in the last 30 days - 0

CDW offers a rewarding career opportunity for a Technical Architect with expertise in ServiceNow The role involves delighting customers by collaborati...

View Details

Sales Development Representative (Remote) - Dscout

Views in the last 30 days - 0

Dscout is a leading company in experience research technology offering a platform for businesses to gain insights into user needs and behaviors They a...

View Details

Federal Security Solutions Engineer - Rapid7

Views in the last 30 days - 0

Rapid7 is seeking a Federal Solutions Engineer with 5 years of experience in cybersecurity solutions engineering or technical sales focusing on federa...

View Details