DevOps Engineer

2024-08-22
USA
Arize AI
About the job
The Opportunity
AI is rapidly changing the world. From processing job applications and credit decisions, to making content recommendations and helping researchers analyze genetic markers at scale -- many aspects of our daily lives are touched by machine learned systems in some way.
Arize is the leading machine learning observability platform to help ML teams discover issues, diagnose problems, and improve the results of machine learning models. In short: we are here to build world class software that helps make AI work better.
The Team
Our On-Prem engineering team is responsible for the deployment of Arize in customer environments. In addition to working with customers in defining infrastructure requirements, the team designs and develops software and tooling that enables the management of these systems at large scale. The On-Prem team has grown to be expert in Kubernetes and cloud deployment on GCP, Azure, and AWS as well as dealing with networking and security aspects of on-premise deployments. The team is dynamic and relies on few talented individuals with a high degree of autonomy and initiative.
What You’ll Do

Work hands-on with the infrastructure that supports our distributed & highly scalable services in both SaaS and on-prem offerings
Gather requirements from customers and adapt manifests and software to support new environments
Use and augment monitoring tools to observe platform health, ensure performance and reliability
Interact with the product team to test new features and package new on-prem releases
Automate and optimize the release pipeline to make it as frictionless as possible
Exhibit continuous curiosity for emerging technology that could solve our challenges

What We’re Looking For

1-2+ years experience in site reliability engineering, DevOps, and system administration
CS (preferred) or other technical degree, or equivalent practical experience
Experience working with DevOps tools such as Kubernetes, Terraform, Ansible, Puppet and Chef
Proficiency with scripting languages such as Python and bash
Experience managing cloud infrastructure in AWS, GCP, and/or Azure
Expertise in Linux administration, configuration, and networking protocols

Bonus Points, But Not Required

Experience with on-prem deployment architectures
Experience running a 24x7 SaaS platform with defined SLI, SLO, SLA
Familiarity with operating machine learning & AI applications

Technologies You’ll Work With:

Kubernetes
Postgres
Messaging systems
Go, Java, Python
Bazel
AWS, GCP

The estimated annual salary for this role is between $100,000 - $185,000, plus a competitive equity package. Actual compensation is determined based upon a variety of job related factors that may include: transferable work experience, skill sets, and qualifications. Total compensation also includes a comprehensive benefit package, including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, and others for mental and wellness support.
More About Arize
Arize’s mission is to make the world’s AI work and work for the people. Our founders came together through a common frustration: investments in AI are growing rapidly across businesses and organizations of all types, yet it is incredibly difficult to understand why a machine learning model behaves the way it does after it is deployed into the real world.
Learn more about Arize in an interview with our founders: https://www.forbes.com/sites/frederickdaso/2020/09/01/arize-ai-helps-us-understand-how-ai-works/#322488d7753c

Diversity & Inclusion @ Arize
Our company's mission is to make AI work and make AI work for the people, we hope to make an impact in bias industry-wide and that's a big motivator for people who work here. We actively hope that individuals contribute to a good culture

Regularly have chats with industry experts, researchers, and ethicists across the ecosystem to advance the use of responsible AI
Culturally conscious events such as LGBTQ trivia during pride month
We have an active Lady Arizers subgroup