Director - SRE

Gurgaon Permanent
  • Leadership role
  • Global MNC

About Our Client

Our client - Global MNC based out of Gurgaon

Job Description

You will continuously monitor and evaluate team workload and organizational efficiency with the support of IT systems, data and analysis and team feedback and makes appropriate changes to meet business needs.

* You will provide team members/direct reports with clear direction and targets that are aligned with business needs and GIT objectives

* You maintain and enhance monitoring framework (data collection, alert aggregation, dashboarding) and Implement and enhance alerting logic (framework)

* You enable proactive Incident alert and resolution leveraging knowledge scripts

* You identify and detect repetitive incidents (stability, reliability) and develop solutions to fix problems.

* You work on technical resolution for incidents and identify technical root cause

* You ensure tool standards, Exploit tool capability to fine tune product reliability

* You integrate incident, release, monitoring, alerting tools into overall ecosystem

* You measure and report SLI, MTTx in periodic reviews, analyse deviations and take actions to closure

* You update runbooks with changes to process / tools

* You drive Postmortems to arrive at remedial actions.

* You participate in On-Call Incident Technical Support

* You ensure production release guidelines (entry/exit) and implementation are adhered to for changes to Production.

* You support CI/CD pipeline implementation and integration to quality and security.

* You scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.

The Successful Applicant

* * Computer Science, Math, or Technical background highly desired; Advanced Degree a plus.

* At least 10 -year experience in IT

* 7 years of experience in relevant area (DevOps / SRE).

* Strong awareness and experience of working with Site Reliability Engineering principles.

* Good understanding of public cloud offerings such as AWS components like EC2, IAM, RDS, Cloudwatch etc.

* Knowledge in Messaging and Streaming frameworks like - RabbitMQ / Kafka

* Knowledge of server-side technologies such as Kubernetes, NodeJS, Docker, Java

* Hands on experience on enterprise tools set such as Grafana, Instana, Prometheus, ELK Stack etc.

* Has exposure to networking concepts (SSH, FTP, TCP/IP, DNS, Load balancing, CDN etc.).

* Has experience in any scripting language (bash / python / perl).

* Good experience with CI/CD pipelines including BitBucket, Jenkins

* Experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts.

* Knowledge of Agile software development principles including using JIRA.

* Excellent organizational, verbal and written communication skills.

* Aptitude to be a good team player and the desire to learn and implement new technologies.

* Experience with building Rest APIs, API Integration, and Web Services is preferred

* Exposure to languages such as Typescript, Nodejs would b ideal

* ITIL V4 Foundation certified a plus

What's on Offer

You will be working in a flexible and family friendly environment and culture.

Quote job ref

Job summary

Sub Sector
Job Type
Job Reference