What you’ll do
As a product gets adopted by customers, its fundamentals such as security, reliability and availability become increasingly vital in driving our business. We're looking for an experienced, distributed systems senior to join us and spearhead initiatives to improve the reliability and performance of all our products. In this role you’ll work across our engineering to design, develop, and deploy practices, processes, and innovative infrastructure software that will be leveraged by the whole organization to ensure that our cloud services are Reliable by Design.
- Own PeerPower system reliability, system security, and system availability
- Enable engineer teams to DevOps culture and sharing best practice
- Understanding and implementing automation process for engineering teams
- Foster a culture of observability access several teams and drive monitoring and alerting work on their team
- As a leader within Engineering, assist with team growth and development while maintaining a high bar for excellence and and technical curiosity
- Lead and define the scope, design, implementation and deployment of robust distributed services, making appropriate tradeoffs between reliability, throughput, latency, resiliency, engineering velocity and cost
- Innovate, design and implement new products and prototypes to improve service resiliency, engineering velocity and management at scale
- Mentor and grow the next generation of technical leaders
- Contribute to engineering strategy, tooling, processes, and culture
- Uphold our high engineering standards and improve our codebase and processes
- Have at least 3+ years of distributed and cloud native software development experience at scale
- Demonstrated ability to balance execution and velocity with building scalable, resilient, distributed systems, cloud native architectures, and mission critical systems
- Have a high level of autonomy and responsibility, and think of yourself as entrepreneurial, proactive and self-driven
- Have a strong history of being senior and mentor
- Understanding and implementing cloud solutions with AWS
- Understanding and implementing CI/CD solutions with Argo CD, Circle CI, etc
- Understanding and implementing containers with Kubernetes, Docker
- Understanding and implementing monitoring and logging solutions: Grafana, Prometheus, etc.
- Experience in Infrastructure as Code: Terraform, Pulumi, etc.
For more information, please contact firstname.lastname@example.org