Director of Site Reliability and Cloud Infrastructure
2024-11-19
USA
Prompt
Director of Site Reliability and Cloud InfrastructureJob Overview: We are seeking a highly skilled and strategic Director of Site Reliability and Cloud Infrastructure to join our team. In this role, you will initially take on the responsibilities of an individual contributor, working hands-on to develop, maintain, and enhance our infrastructure while ensuring security, reliability, and scalability. As you establish a strong foundation, you will also be responsible for collaborating with our existing vendors and scaling the internal team by hiring additional resources focused on security, site reliability, and cloud infrastructure.This position is perfect for a seasoned leader who thrives in both a hands-on technical role and strategic leadership. You will play a critical part in shaping the future of our infrastructure and ensuring that our systems are both secure and highly available.Key Responsibilities:Hands-On Infrastructure Management: Develop and maintain scalable and automated infrastructure solutions,particularly on AWS.Implement and manage monitoring, alerting, and logging systems to detect andaddress reliability and security risks. Manage incident response and resolution processes to minimize downtime,prevent recurrence, and ensure robust disaster recovery practices.Conduct system performance tuning, capacity planning, and optimization toeffectively manage resource utilization and loads.Vendor Collaboration and Oversight:Build and maintain strong relationships with cloud, security, and infrastructure vendors, ensuring their services meet performance, compliance, and security needs.Lead contract negotiations and performance reviews for external vendors, ensuring alignment with internal standards and SLAs.Team Building and Leadership:Hire, mentor, and lead a high-performing team of site reliability engineers (SREs),security experts, and infrastructure engineers.Develop career growth plans and technical progression frameworks for teammembers, ensuring skills development in cloud technologies and SRE bestpractices.Create a cohesive vision for cloud infrastructure, reliability, and security, aligningwith the broader organizational goals.Security and Compliance Leadership:Implement and maintain security best practices, including compliance with SOC2, HIPAA, and other relevant standards.Ensure the infrastructure is protected against threats and vulnerabilities.Drive innovation in cloud infrastructure and security, continuously improving ourprocesses and systems.Automation and Tooling:Build and maintain automation tools and scripts to streamline system updates, deployments, and monitoring.Design and oversee CI/CD pipelines, ensuring seamless integration with development and operations teams.Collaboration and Stakeholder Management:Work closely with the development, operations, and product teams to ensurealignment on priorities and collaboration on large-scale projects.Provide technical guidance and mentorship across teams, championing a cultureof reliability, automation, and security.Communicate progress, risks, and issues clearly to both technical andnon-technical stakeholders.Qualifications:Bachelor’s degree in Computer Science, Engineering, or a related field.Proven experience in a senior leadership role managing cloud infrastructure and sitereliability, preferably within an AWS environment (EC2, S3, RDS, ELB, etc.).Hands-on experience with infrastructure as code (e.g., Terraform, CloudFormation) andautomation tools (e.g., Ansible, Jenkins).Strong scripting skills (Python, Bash) and the ability to automate complex tasks.Demonstrated success in scaling infrastructure and teams, particularly withinhigh-availability and high-growth environments.Solid understanding of networking, cloud security, and compliance standards (e.g.,SOC2, HIPAA).Strong incident management skills and the ability to lead post-incident reviews to driveimprovements.Excellent communication skills and the ability to collaborate effectively withcross-functional teams.Experience in hiring, developing, and managing technical teams with a focus on careerdevelopment and innovation.Preferred Qualifications:Experience in a high-growth SaaS company, especially within the healthcare or regulated industries.Familiarity with cloud cost optimization, scalability best practices, and disaster recovery strategies.Demonstrated ability to lead through influence, setting technical direction and ensuring execution across teams.Relevant certifications: AWS Solutions Architect, DevOps Engineer, Security; CCSP; CISSPPerks - What you can expect:Competitive salariesRemote/hybrid environmentPotential equity compensation for outstanding performanceFlexible PTOCompany-wide sponsored lunchesCompany paid disability and life insurance benefitsCompany paid family and medical leaveMedical, dental, and vision insurance benefitsDiscounted pet insuranceFSA/DCA and commuter benefits401kPrompt Therapy Solutions, Inc is an equal opportunity employer, indiscriminate of race, color, religion, ethnicity, ancestry, national origin, sex, gender, gender identity, sexual orientation, age, marital status, veteran status, disability, medical condition, or any other protected characteristic. We celebrate diversity and are committed to creating an inclusive environment for all employees.Prompt Therapy Solutions, Inc is an E-Verify Employer.