Manager, Platform & Site Reliability at CIRA — NeverHard
Manager, Platform & Site Reliability at CIRA in Ottawa, Ottawa region. Skills: Cybersecurity, Operational Excellence, Platform Engineering, Technology Leadership. Apply on NeverHard.
Company
CIRA
Location
Ottawa, Ottawa region
Type
full_time
Required skills:
Cybersecurity
Operational Excellence
Platform Engineering
Technology Leadership
Job DescriptionJob DescriptionSalary: $135,000.000 - $150,000.000
Join the team that is building a trusted internet for Canadians! CIRA may be best known for managing the .CA domain but our impact reaches far beyond that. Were at the forefront of advancing cybersecurity technologies and leading projects that improve the digital experience for users across Canada and the world. Our broad scope of activities is driven by one central goal: to strengthen and secure Canadas digital landscape.
By working with the CIRA registry team, youll play a part in advancing the CIRA Registry Platform, which supports a wide range of domains globally. Help us drive innovation and maintain the high standards of stability and security that our platform is known for. Join us in advancing digital identity and technology in Canada and beyond.
Who You Are:
You are a people-first technology leader who thrives at the intersection of reliability, platform engineering, and operational excellence. You enjoy building high-performing teams, creating clarity in complex environments, and empowering engineers to do their best work. You balance strategic thinking with technical depth, helping teams deliver resilient, scalable services while continuously improving processes, tooling, and ways of working. Most importantly, you're motivated by solving meaningful challenges and contributing to infrastructure that Canadians and organizations around the world rely on every day.
What You'll Do:
Lead, coach, and develop a high-performing team of SRE and Platform Specialists responsible for the reliability, scalability, security, and operational excellence of CIRA's registry platforms and supporting technology services.
Define and execute the platform and site reliability strategy, aligning priorities and investments with organizational objectives and customer needs.
Define and mature SRE practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, production readiness standards, and operational acceptance criteria for mission-critical registry services.
Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms using public cloud technologies such as AWS.
Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities to reduce manual effort, operational toil, and engineering bottlenecks.
Establish and continuously improve observability, monitoring, alerting, and dashboarding practices to provide clear visibility into platform health, service reliability, and customer-impacting issues.
Lead incident management for high-severity events, providing incident command, stakeholder communication, root cause analysis, and driving follow-up actions that strengthen long-term platform resilience.
Collaborate with engineering, security, support, compliance, and business stakeholders to establish priorities, balance risk, and deliver platform improvements that support registry operations and organizational goals.
Drive performance engineering, capacity planning, disaster recovery testing, and resilience validation to ensure the ongoing reliability and availability of critical registry platforms and related services.
Foster a culture of ownership, accountability, continuous learning, operational excellence, and psychological safety that empowers the team to innovate and perform at their best.
What You Bring:
7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations, including hands-on experience with public cloud platforms such as AWS.
3+ years of experience leading, coaching, and developing technical teams in SRE, platform engineering, DevOps, infrastructure, or cloud operations.
Demonstrated success building and developing high-performing engineering teams through mentoring, coaching, performance management, and fostering a culture of continuous learning and accountability.
Experience defining technical strategy, influencing cross-functional stakeholders, and balancing reliability, security, operational excellence, and business priorities.
Strong hands-on background with public cloud platforms, preferably AWS, including cloud-native architecture, networking, security, resilience, scalability, and cost-aware operations.
Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices to manage cloud infrastructure, platform services, and deployment workflows.
Strong understanding of CI/CD principles, release automation, and modern software delivery practices.
Experience with containerization and orchestration technologies such as Docker and Kubernetes.
Experience with observability platforms, monitoring frameworks, incident management practices, and operational analytics tools.
Demonstrated experience defining and implementing SLOs, SLIs, error budgets, production readiness standards, and incident response processes.
Strong understanding of disaster recovery, business continuity, backup and recovery strategies, and resilience testing.
Experience supporting highly available, mission-critical, or regulated technology platforms where reliability, security, and operational discipline are essential.
Exceptional communication, collaboration, and stakeholder management skills, with the ability to translate complex technical concepts into clear business outcomes for both technical and non-technical audiences.
Who We Are:
At CIRA, were driven by a passion to make a positive impact on Canadas digital future. Were not just asking, What more can we do?were actively exploring new frontiers to enhance and secure the internet for all Canadians. Our recognition as one of the National Capital Regions Top Employers for ten years is a testament to our vibrant culture. We believe in fostering an environment where collaboration and candour are second nature and where diverse perspectives are integral to our success, because we know that great ideas come from everywhere. If youre passionate about innovation and ready to make a difference in a dynamic field, join us and help shape the future of the internet!
CIRA embraces a blend of remote and IRL in-office work to keep our team connected and engaged. Our Ottawa headquarters is a hub for regular events and social activities that bring our team together, encouraging a strong sense of community within our organization. No matter where you work from, you'll always feel part of our vibrant team and our shared mission.
At CIRA, people remain at the centre of our recruitment process. While CIRA uses recruitment platforms that include artificial intelligence-enabled features, which may be used to support administrative processes or skills-based assessments, these features are intended to assist our recruitment activities and do not replace human judgment. All applicant screenings, interviews, evaluations and selection decisions are conducted by our staff. Artificial intelligence is not used to make autonomous or final hiring decisions.
This posting is for an existing vacancy. CIRA is committed to fair, inclusive and accessible recruitment practices. Accommodations are available upon request throughout the recruitment, assessment and selection process. If you require any assistance or accommodation, please contact us at peopleandculture@cira.ca.