Responsibilities:

  • Install, configure, build and troubleshoot our production servers and services.
  • Maintain 100% uptime of the production services.
  • Optimize / tune our servers and services for performance, scalability, and maintainability.
  • Ensure that our monitoring tools catch and generate alerts on all production issues.
  • Resolve issues reported by our monitoring tools, including following through on long-term issues.
  • Follow escalation process through issue completion, including providing documentation after resolution.
  • Supervise the junior system administrators to ensure that they are following procedures and completing tasks successfully.
  • Perform root cause analysis of production issues and provide a report which includes recommendations for identifying future issues more quickly as well as preventing future failures entirely, whether through process or technology improvements.
  • Send periodic NOC reports to managers with the system and service status.
  • Manage backups and disaster recovery, including backup monitoring and verification, and leading restoration tests and disaster recovery drills.
  • Become a technical escalation point during your shift.

Required skills and experience:

  • 6+ years experience supporting a real-time 24×7 production online web environment.
  • Strong written and verbal communication skills; ability to organize and prioritize tasks.
  • Experience training and mentoring more junior members of the team, and working with other departments to solve cross-departmental problems.
  • Knowledge of unix scripting: shell, perl, python, ruby or equivalent languages (from an automation and monitoring standpoint).
  • Ability to identify and configure add-on modules or plugins of open-source tools to effectively automate tasks and monitor production services.
  • Strong knowledge of DNS, system build automation, and system configuration management tools.
  • Familiarity with server virtualization technologies (Xen or equivalents).
  • Experience with configuring and tuning monitoring systems(Nagios, Graphite or equivalents).
  • Ability to work in fast-paced environments with weekly release schedules.

Desired skills and experience:

  • Knowledge of redis or other nosql databases.
  • RHCE certification or equivalent experience.
  • Experience with package management (preferably on Debian systems).
  • Working knowledge of the following technologies (or equivalents): ldap, ntp, dns, fai, dhcp, subversion, git, chef, mysql.
  • Experience executing multi-week and multi-person projects as per the project plan.

 

For Apply, send updated profile to resumes@colligate.in, we will get back to you!

NOC Lead / Sr. System Admin

Leave a Reply

Skip to toolbar