Site Reliability Engineer
2024-10-26
Canada
Perlego
What we do
At Perlego, there are over 100 of us working hard to make education accessible to all. In this digital age, we believe that anyone should be able to learn anything at any time. Knowledge should be more accessible, not locked behind sky-high price tags.
Over the past 5 years, our goal has been to support students across the UK & Europe to access quality books. The next stage of Perlego is twofold: 1) expand our support to students globally, and 2) build a product that goes beyond the book, a platform that helps students study smarter and more effectively.
What we're looking for:
We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in AWS services and monitoring tools. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. You will be integral to swiftly addressing issues, resolving incidents independently, and thriving in a fast-paced environment.
How we collaborate:
Our organization operates across multiple time zones, with teams based in across Europe. As an SRE, you will provide critical support during off-hours, working autonomously to resolve issues while collaborating closely with our teams to ensure continuous service availability. You will be part of a global team, supporting cloud infrastructure and platform initiatives.
What you’ll do:
As a Site Reliability Engineer, your main focus will be to ensure our services remain highly available and performant. Key responsibilities include:
Monitoring & Incident Management:
Monitor and manage platform activity using tools like Datadog, Prometheus, Grafana, or AWS CloudWatch.
Respond quickly to alerts and incidents, independently resolving issues and ensuring service uptime during off-peak hours.
Conduct post-incident reviews and help improve system resiliency through automation and monitoring enhancements.
Cloud Infrastructure Management:
Manage and support AWS infrastructure, focusing on scalability, security, and reliability.
Handle deployments, managing CI/CD pipelines for both containerized (Docker/Kubernetes) and serverless (AWS Lambda) applications.
Ensure effective backup, recovery, and disaster recovery strategies to minimize downtime.
Collaboration & Communication:
Collaborate with cross-functional teams to implement platform improvements.
Work independently and make swift decisions when managing service incidents outside core business hours.
Assist in platform security, ensuring adherence to best practices for cloud security and compliance.
Continuous Improvement:
Automate manual processes to reduce human error and improve efficiency.
Continuously enhance monitoring systems, ensuring robust early detection and resolution capabilities.
Identify potential performance bottlenecks and contribute to overall platform optimization.
Requirements
This role is ideal for you if you possess:
Experience in Site Reliability Engineering, DevOps, or a similar field.
Strong experience with AWS services
Expertise in using monitoring tools (e.g. Prometheus, Grafana, CloudWatch) for real-time platform performance insights.
Hands-on experience with CI/CD pipeline management for deploying containerized (Docker) and serverless applications.
Proficiency in Linux-based operating systems and shell scripting.
Familiarity with Infrastructure as Code tools (Terraform, CloudFormation).
Experience with incident management, troubleshooting, and platform recovery in high-pressure environments.
Strong communication skills with a proven ability to work both independently and collaboratively across time zones.
⭐️ It’s a plus if you have:
Experience working in a global, distributed team providing off-hours support.
Knowledge of container orchestration tools.
Previous experience with SecOps and cloud security best practices.
Familiarity with scaling highly available systems in a fast-paced, growth-oriented environment.
Benefits
Benefits include:
✨Compensation
The salary available for this role is CA$105,000 + Share options
Why should you work at Perlego?
Apart from our mission, we foster a unique company culture championing self-empowerment, personal development, direct communication and mutual support. We’re proud of our Glassdoor reviews and the fact that 97% of our team would recommend Perlego as a place to work.
Want to learn more about how we’re making learning accessible? Check out our latest impact report
L&D Budget
We value continuous learning and you will have a personal L&D budget for online courses, subscriptions or books not on Perlego.
Unlimited Coaching Opportunities
Unlimited access to MoreHappi, an on-demand professional coaching platform to offer all employees access to unbiased and professional coaching opportunities.
Learning Time
All employees have dedicated Learning Time to focus on new skills, projects or interests that lay outside of their day-to-day job
Work-Life Balance
Everyone needs a break, so enjoy 30 days off (incl. bank holidays) + 1 additional day annual leave for every year of service up to 35 days off (incl. bank holidays)
Flexi Bank Holidays
We understand that not everyone aligns with the same calendar; we offer the flexibility to take your local country's bank holiday allowance for other religious or cultural days.
e.g. switch UK Easter Bank Holidays Days for Eid celebrations
❄️ Office Reset
All employees can also enjoy the days between Boxing Day and New Year off, to reset and refresh for the new year - this is additional to your annual leave
Sabbatical
After three years there is an opportunity to take a 1-month unpaid sabbatical, and after five years there is an opportunity to take a 1-month paid sabbatical
Personal Days
Life happens and we want you to be able to use your annual leave for resting, relaxing or taking time out to do something you love!
We offer 1 additional day a year for life events (your wedding, relocation, moving house, or a child starting school).
Health & Wellbeing
We want everyone to feel healthy and happy, so you get private medical insurance
Family time
We believe family is really important; we offer new parents a competitive matched parental leave as well as a phased return to work from extended leave.
Belonging at Perlego:
We are an equal opportunity employer and value diversity of thought and background.
❤️ We are actively building a diverse team, so we strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
To enable an equitable experience for all and give you the best chance of success, if you have any specific requirements for any stage of the interview process, please let us know by emailing [email protected]
About the company
Making learning accessible to all
At Perlego, there are over 100 of us working hard to make education accessible to all. In this digital age, we believe that anyone should be able to learn anything at any time. Knowledge should be more accessible, not locked behind sky-high price tags.
Over the past 5 years, our goal has been to support students across the UK & Europe to access quality books. The next stage of Perlego is twofold: 1) expand our support to students globally, and 2) build a product that goes beyond the book, a platform that helps students study smarter and more effectively.
What we're looking for:
We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in AWS services and monitoring tools. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. You will be integral to swiftly addressing issues, resolving incidents independently, and thriving in a fast-paced environment.
How we collaborate:
Our organization operates across multiple time zones, with teams based in across Europe. As an SRE, you will provide critical support during off-hours, working autonomously to resolve issues while collaborating closely with our teams to ensure continuous service availability. You will be part of a global team, supporting cloud infrastructure and platform initiatives.
What you’ll do:
As a Site Reliability Engineer, your main focus will be to ensure our services remain highly available and performant. Key responsibilities include:
Monitoring & Incident Management:
Monitor and manage platform activity using tools like Datadog, Prometheus, Grafana, or AWS CloudWatch.
Respond quickly to alerts and incidents, independently resolving issues and ensuring service uptime during off-peak hours.
Conduct post-incident reviews and help improve system resiliency through automation and monitoring enhancements.
Cloud Infrastructure Management:
Manage and support AWS infrastructure, focusing on scalability, security, and reliability.
Handle deployments, managing CI/CD pipelines for both containerized (Docker/Kubernetes) and serverless (AWS Lambda) applications.
Ensure effective backup, recovery, and disaster recovery strategies to minimize downtime.
Collaboration & Communication:
Collaborate with cross-functional teams to implement platform improvements.
Work independently and make swift decisions when managing service incidents outside core business hours.
Assist in platform security, ensuring adherence to best practices for cloud security and compliance.
Continuous Improvement:
Automate manual processes to reduce human error and improve efficiency.
Continuously enhance monitoring systems, ensuring robust early detection and resolution capabilities.
Identify potential performance bottlenecks and contribute to overall platform optimization.
Requirements
This role is ideal for you if you possess:
Experience in Site Reliability Engineering, DevOps, or a similar field.
Strong experience with AWS services
Expertise in using monitoring tools (e.g. Prometheus, Grafana, CloudWatch) for real-time platform performance insights.
Hands-on experience with CI/CD pipeline management for deploying containerized (Docker) and serverless applications.
Proficiency in Linux-based operating systems and shell scripting.
Familiarity with Infrastructure as Code tools (Terraform, CloudFormation).
Experience with incident management, troubleshooting, and platform recovery in high-pressure environments.
Strong communication skills with a proven ability to work both independently and collaboratively across time zones.
⭐️ It’s a plus if you have:
Experience working in a global, distributed team providing off-hours support.
Knowledge of container orchestration tools.
Previous experience with SecOps and cloud security best practices.
Familiarity with scaling highly available systems in a fast-paced, growth-oriented environment.
Benefits
Benefits include:
✨Compensation
The salary available for this role is CA$105,000 + Share options
Why should you work at Perlego?
Apart from our mission, we foster a unique company culture championing self-empowerment, personal development, direct communication and mutual support. We’re proud of our Glassdoor reviews and the fact that 97% of our team would recommend Perlego as a place to work.
Want to learn more about how we’re making learning accessible? Check out our latest impact report
L&D Budget
We value continuous learning and you will have a personal L&D budget for online courses, subscriptions or books not on Perlego.
Unlimited Coaching Opportunities
Unlimited access to MoreHappi, an on-demand professional coaching platform to offer all employees access to unbiased and professional coaching opportunities.
Learning Time
All employees have dedicated Learning Time to focus on new skills, projects or interests that lay outside of their day-to-day job
Work-Life Balance
Everyone needs a break, so enjoy 30 days off (incl. bank holidays) + 1 additional day annual leave for every year of service up to 35 days off (incl. bank holidays)
Flexi Bank Holidays
We understand that not everyone aligns with the same calendar; we offer the flexibility to take your local country's bank holiday allowance for other religious or cultural days.
e.g. switch UK Easter Bank Holidays Days for Eid celebrations
❄️ Office Reset
All employees can also enjoy the days between Boxing Day and New Year off, to reset and refresh for the new year - this is additional to your annual leave
Sabbatical
After three years there is an opportunity to take a 1-month unpaid sabbatical, and after five years there is an opportunity to take a 1-month paid sabbatical
Personal Days
Life happens and we want you to be able to use your annual leave for resting, relaxing or taking time out to do something you love!
We offer 1 additional day a year for life events (your wedding, relocation, moving house, or a child starting school).
Health & Wellbeing
We want everyone to feel healthy and happy, so you get private medical insurance
Family time
We believe family is really important; we offer new parents a competitive matched parental leave as well as a phased return to work from extended leave.
Belonging at Perlego:
We are an equal opportunity employer and value diversity of thought and background.
❤️ We are actively building a diverse team, so we strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
To enable an equitable experience for all and give you the best chance of success, if you have any specific requirements for any stage of the interview process, please let us know by emailing [email protected]
About the company
Making learning accessible to all