India Openings

NOC Engineer

Job ID: NOC-ETP-Pun-1292

Location: Pune

Job Description – 24×7 NOC Engineer (NOC)


We are looking for a 24×7 NOC Engineer (NOC) to ensure the availability, performance, and reliability of production systems. This role is hands-on across monitoring/observability, incident response, troubleshooting, and automation, working closely with engineering and infrastructure teams to reduce downtime and improve operational excellence.


Shift / Support Model: 24×7 rotational shifts (including nights/weekends) with on-call participation as required.


Key Responsibilities

  • Monitor applications and infrastructure using New Relic, Datadog, Grafana and related observability tooling; maintain dashboards and actionable alerting.
  • Alert creation, tuning, and noise reduction
  • Provide L1/L2 incident response in a 24×7 environment; triage alerts, restore service quickly, and manage escalations.
  • Perform deep troubleshooting across Linux systems, Kubernetes workloads, infrastructure components, and network paths.
  • Conduct log analysis using Newrelic/ELK (and/or similar platforms) to identify patterns, correlate events, and support root cause analysis.
  • Build and enhance automation for routine operational tasks, alert remediation, and reporting using Python and Bash.
  • Manage infrastructure changes using Terraform and follow Infrastructure-as-Code practices (review, version control, rollback readiness).
  • Support Kubernetes platform operations by assisting with deployments, performing cluster/service health checks, executing scaling and recycling activities, monitoring capacity and performance, and troubleshooting issues.
  • Maintain clear runbooks, SOPs, and shift handover notes; ensure knowledge is captured and reusable.
  • Partner with engineering and cloud/infrastructure teams to improve reliability through post-incident reviews, problem management, and continuous improvements to observability.

Must-have Skills

  • Monitoring & Observability: New Relic, Datadog, Grafana; strong alert triage and dashboarding skills.
  • Linux: administration fundamentals, process/service troubleshooting, permissions, performance basics.
  • Automation & Scripting: Bash and Python for operational tooling and automation.
  • Infrastructure as Code: Terraform (hands-on).
  • Containers: Kubernetes (workload troubleshooting, cluster concepts).
  • Networking: TCP/IP basics, DNS, HTTP/HTTPS, load balancing concepts, connectivity troubleshooting.
  • Log Analysis: ELK (or equivalent), querying/correlation for RCA support.

Secondary Skills

  • Cloud infrastructure fundamentals (AWS/Azure/GCP).
  • Good communication skills: clear incident updates, shift handovers, and stakeholder coordination.

 

Qualifications & Experience

  • Bachelor’s degree (B.Tech/B.E., MCA) or equivalent practical experience.
  • 4–6 years of experience in SRE / NOC / Production Support / DevOps / Infrastructure Operations.
  • Experience working in a shift-based operations environment with strong ownership and urgency.
  • Ability to document clearly (runbooks, post-incident notes) and collaborate effectively with cross-functional teams.