Production Management Engineer

NTT DATA Services strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.
We are currently seeking a Production Management Engineer to join our team in Halifax, Nova Scotia (CA-NS), Canada (CA).

Job Responsibilities Include:

  • Monitor and resolve system errors/disruptions. Document resolution. Manage incident per ITIL lifecycle. Liaise with upstream data owners to provide resolution. Respond to and solve inquiries and operations requested by users. Document/Review handling steps for support scenarios.
  • Prepare and present stability reports and presentations. Analyze alert and stability trends and make recommendations. Investigate root cause of issues and inform/educate developers about the cause so the developers can mitigate the root cause.
  • Automate (1) resolution of common problems (2) routine investigations (3) routine user requests using scripts or available programming platform. Lead reliability or business-driven projects. Provide reliability engineering.
  • You will work closely with engineering/development teams to design, build and maintain systems and help them decide on products to use, schema design and query tuning
  • You will troubleshoot issues across the entire stack: hardware, software, application and network
  • You will mentor other SREs on standard methodology from monitoring and troubleshooting complex code and database issues
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
  • Participate in on-call rotation and conference calls with other specialists over different time zones

Basic Qualifications:

  • Hands-on Unix experience
  • Hands-on experience with SQL-based DB
  • Three Tier support with DBs such as IMB, DB2, Sybase, Mongo, Green Plum, KDB
  • Excellent analytical and communication skills
  • Ability to prioritize and willingness to take ownership
  • Problem-solving mindset and solution enabler
  • Great problem-solving and debugging ability
  • Familiar with financial products like Equity and Fixed Income, Securities and different types of risks in an investment bank, Trade flow
  • Contribute to system design and architecture through strong database knowledge

Preferred Skills:

  • Knowledge of automation-related activities using scripting languages like Python, Perl, Ruby and Bash
  • Hands-on experience with enterprise tools like AppDynamic, Grafana, Splunk, Dynatrace
  • Awareness of and ability to reason about modern software and system architectures, including load balancing, queuing, caching, distributed systems failure modes, microservices, Cloud, etc.
  • Deep understanding of operating system concepts such as process, memory allocation and the network stack; an understanding of how applications are affected by the above and the ability to debug.
  • Practical experience running large scale online systems is always an advantage