Lead Site Reliability Engineer

2024-09-10
USA
Curology
Mission of the Role:Architect and lead the delivery of high-quality and reliable solutions through creative problem-solving and technical expertise to address our business problems on a frequent and regular cadence. Enable Engineers on your team to improve the quality and impact of their work and delivery. Evangelize reliability-as-a-feature through monitoring, service-level objectives, automation, everything-as-code, and testing.Essential Functions and Impact Areas:Provide technical leadership and guidance to the SRE team, driving best practices in reliability engineering, automation, and service management.Set the direction for SRE projects, aligning them with organizational goals, and ensuring successful execution from concept to delivery.Helps define and instrument Service-Level Objectives to ensure the most excellent customer experience.Lead initiatives to improve system resilience and scalability.Hosts postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services. Leads projects from inception to completion. Participates in an on-call rotation to assist in finding a resolution during incidents.Minimum Skills & Requirements:7+ years of experience building infrastructure solutions in AWS using Infrastructure-as-Code technologies such as Terraform or CloudFormation.7+ years of experience working with Docker containers and related orchestration technologies (such as Kubernetes or ECS).7+ years of experience building and deploying CI/CD pipelines.Experience with AWS, Docker, Kubernetes, Terraform, Python, PHP, and LaravelExperience with architectural patterns of large, high-scale applications, such as well-designed APIs and database schemas.Experience leading projects and initiatives that are wide in scale and complex in nature.Experience working collaboratively in cross-functional teams with engineers in product and data groups.Deep technical expertise; Writes, debugs, and refactors code while being mindful of tradeoffs, scalability, architecture, and code cleanliness. Demonstrates mastery of their craft to solve problems in automation, infrastructure, and/or developer tooling.Reliability & Quality; Experience leveraging observability tooling and practices such as SLOs to help engineering teams own the reliability and quality of the software they build.Leadership - Define and deliver large, complex projects that may include coordination with non-technical stakeholders. Help define the SRE function and be a champion for it throughout the organization.Why You'll Love Working at Curology:Competitive salary and equity packagesCompany Performance Incentive PlanComprehensive benefits: medical, dental, and vision insurance for employees; flexible spending account; 401k; mental health & wellness programsCompany Performance Incentive Plan$75 WFH stipend (remote employees) Home office setup stipend (remote employees) Minimum Time Off policy (unlimited PTO, with at least 3 weeks off) for exempt employees11 company observed holidaysAdditional holidays: Curology days off (1 per quarter), 1 annual floating holiday (employee’s choice), and Gratitude Week (employees take the full week of Thanksgiving off; business critical teams observe different days)Paid parental leaveEmployee donation matching program Company-sponsored events Free subscription to Curology or Agency