Senior Site Reliability Engineer – Digital Nova Scotia – Leading Digital Industry

Senior Site Reliability Engineer

Venor is partnering with ABM Integrated Solutions in the search for a Senior Site Reliability Engineer. Headquartered in Dartmouth, ABM is a nationally recognized technology company delivering end-to-end managed IT services and dependable field support to organizations across Canada and the United States. Combining the agility of a full-scale managed services provider with the reliability of a coast-to-coast field services operation, ABM ensures customers receive fast, consistent, and high-quality technical support wherever they operate.

The Senior Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of our systems and services in a production environment, particularly for a mission-critical Java application with tight RTO/RPO expectations. This role demands a proactive, detail-oriented individual with excellent communication skills and a solid technical foundation.

What you’ll be doing:

  • Manage and maintain robust infrastructure solutions to maintain high service uptime.
  • Optimize AWS services, with a strong focus on EKS clusters and RDS Oracle.
  • Manage and maintain monitoring and alerting systems using Datadog to ensure prompt issue resolution.
  • Lead deployment processes using Helm, ensuring security compliance throughout the process.
  • Collaborate with development teams to support the secure deployment and operation of mission-critical Java applications.
  • Perform regular system performance tuning, capacity planning, and security assessments.
  • Assist in documenting processes, and procedures to ensure knowledge sharing and continuity.
  • Participate in an on-call rotation to provide 24/7 support for critical systems and ensure rapid resolution of incidents.
  • Participate in Disaster Recovery (DR) simulations to ensure preparedness for potential outages and data loss scenarios, with a focus on maintaining data integrity and security.

What we are looking for:

  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • Minimum of 5 years of experience in site reliability engineering or a related discipline.
  • Strong database management skills, with an emphasis on secure data handling.
  • Experience with AWS services and infrastructure, including security best practices.
  • Proven experience with Kubernetes, specifically EKS.
  • Hands-on experience with Datadog for monitoring and alerting.
  • Experience with Helm for managing deployments.
  • Exceptional communication skills, especially under pressure.
  • Strong problem-solving abilities, attention to detail.
  • Ability to work both independently and collaboratively within a team.
  • Experience with CI/CD pipelines and DevOps methodologies in a production environment.

Preferred Qualifications:

  • Relevant certifications in AWS, Kubernetes, or related technologies.
  • Familiarity with additional monitoring and alerting tools.

ABM offers exposure to cutting-edge technologies, meaningful opportunities for professional growth, and a comprehensive total rewards package. You’ll join a team that values collaboration, continuous improvement, and delivering measurable impact for customers across North America.

Venor and ABM welcome applicants from all backgrounds and identities. To learn more about this opportunity, contact Steph MacIntosh at steph@venor.ca or Craig Coady at craig@venor.ca.