Observability Engineer
Company
McCain Foods
Location
Other US Location
Type
Full Time
Job Description
Position Title: Observability Engineer
Position Type: Regular - Full-Time
Position Location: Gurgaon
Grade: Grade 05
Requisition ID: 33010
The IT Infrastructure, Engineering and Operations team is looking for an Observability Engineer ideally with expertise in Enterprise Network, Systems, and Application monitoring and logging development.
JOB RESPONSIBILITIES:
• Develop and improve instrumentation for monitoring and logging the health and availability of services.
• Proactively monitor systems, networks, and applications to provide input in improving the stability, security, efficiency, and scalability of systems.
• Develop and maintain Monitoring and Logging Frameworks for all of ITX Take personal responsibility for the quality, reliability and availability of global IT corporate infrastructure.
• Own operations documentation of monitoring and logging for global IT production infrastructure.
• Participate in rotating on-call incident response on the weekdays and on the weekends. Improve operational efficiencies via scripting, bots and integrations.
• Participate cross functionally with vendors and other IT engineering teams to ensure smooth service delivery.
• Network and systems troubleshooting, fault analysis, and resolution.
• Collaborate with Incident and Problem Management to reduce MTTR and Incident volume.
• Design, implement, and maintain AIOps solutions to monitor and analyze IT systems, applications, and networks.
• Deploy machine learning algorithms for anomaly detection, root cause analysis, and incident prediction.
• Configure and manage observability tools and platforms to gain real-time visibility into system health and performance.
• Develop monitoring dashboards, alerts, and reports to provide comprehensive insights into the IT environment.
• Conduct root cause analysis for incidents using data from AIOps and observability tools to identify underlying issues.
• Work closely with software engineers to instrument applications with appropriate logging, metrics, and tracing capabilities
• Continuously analyze monitoring data to identify trends, anomalies, and opportunities for optimization.
• Stay updated with industry trends and advancements in AIOps and observability practices, and recommend new tools or methodologies for adoption
• Designing, developing, and implementing AI models and algorithms utilizing state-of-the-art techniques such as GPT, VAE, and GANs.
• Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals.
• Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques and identify opportunities to integrate them into our products and services.
• Optimizing existing generative AI models for improved performance, scalability, and efficiency.
• Developing and maintaining AI pipelines, including data preprocessing, feature extraction, model training, and evaluation.
• Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and non-technical stakeholders.
• Contributing to the establishment of best practices and standards for generative AI development within the organization.
• Providing technical mentorship and guidance to junior team members.
• Apply trusted AI practices to ensure fairness, transparency, and accountability in AI models and systems
• Drive DevOps and MLOps practices, covering continuous integration, deployment, and monitoring of AI
• Utilize tools such as Docker, Kubernetes, and Git to build and manage AI pipelines
• Implement monitoring and logging tools to ensure AI model performance and reliability
• Collaborate seamlessly with software engineering and operations teams for efficient AI model integration and deployment.
• Familiarity with DevOps and MLOps practices, including continuous integration, deployment, and monitoring of AI models.
•
KEY QUALIFICATION & EXPERIENCES:
• Minimum 10 years of experience in Observability/Monitoring tools
• Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field.
• 5+ years of industry experience in software development.
• In-depth experience designing at scale monitoring and logging for corporate infrastructure services.
• Expert level experience in monitoring and logging technologies, both open source and closed source (e.g. AppDynamics, Newrelic, Datadog, Prometheus, Grafana, LogicMonitor, SumoLogic, ELK)
• Experience in implementing Metrics, Logs and Tracing for E2E observability
• Experience in RBAC and user based security services such as ISE, Radius, LDAP, and AD.
• Must have strong automation/scripting skills - proficiency in Python or Golang is a plus.
• Proficient in developing and maintaining technical documentation, runbooks, and procedures.
• A working knowledge in Network is needed. Fundamental knowledge of TCP/IP stack, application protocols (DHCP/DNS/HTTPs) and networking concepts (HSRP/NAT/VPN/VLANs/802.1x/Wireless/Clustering/High Availability/Load Balancing).
• Understanding of enterprise networks using Cisco IOS/NXOS with a working knowledge of IP Protocols (TCP/UDP/ICMP) and Routing Protocols (BGP/OSPF/IS-IS).
• Technology understanding of Cisco, Cloud Native Firewalls, including Firewall Policy Rules, URL-Filtering, App-ID, User-ID, etc.
• Experience interacting with Telco and Global ISPs (WAN/DIA) and the monitoring of those services.
• A working knowledge of systems is needed. Fundamental knowledge of Configuration Management and Automation tools, with experience in: * Terraform, Ansible, Chef, Puppet, Jenkins
* Designing and implementing CI/CD pipelines * Infrastructure provisioning and management
• Strong in troubleshooting incidents in production environment.
• A strong ownership attitude and a track record of taking responsibility for problems and pushing through to resolution.
• Bachelor's degree in Computer Science or EE, or relevant industry experience is required.
• Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions.
• Experience with AIOps and machine learning is highly desirable.
• Knowledge of OpenTelemetry is an added advantage.
• Experience with other monitoring tools like Prometheus, Grafana, etc.
• Experience with Observability solutions like Dynatrace, DataDog, Instana etc. is highly desirable
• Experience working with mainframe systems is a plus (willingness to learn is also acceptable).
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• Ability to work independently and manage multiple projects simultaneously.
• Passion for learning new technologies and continuous improvement.
• In-depth knowledge of machine learning, deep learning, and generative AI techniques
• Knowledge and experience in Generative AI
• Proficiency in programming languages such as Python, R, and frameworks like TensorFlow or PyTorch
• Strong understanding of NLP techniques and frameworks such as BERT, GPT, or Transformer models
• Familiarity with computer vision techniques for image recognition, object detection, or image generation
• Experience with cloud platforms such as Azure or AWS
• Knowledge of IT operations concepts and processes, such as monitoring, incident management, root cause analysis, remediation.
Nice To Have:
• Ability to take lead in an operations environment.
• Contributed to Open Source - your public Git repos/contributions show good examples of giving back to the community.
• Architected a monitoring and logging infrastructure that was technology agnostic for a production infrastructure environment.
• Knowledge of revision control software such as GIT.
• Familiarity with REST APIs scripting, i.e. with PAN OS API / Infoblox WAPI.
McCain Foods is an equal opportunity employer. We see value in ensuring we have a diverse, antiracist, inclusive, merit-based, and equitable workplace. As a global family-owned company we are proud to reflect the diverse communities around the world in which we live and work. We recognize that diversity drives our creativity, resilience, and success and makes our business stronger.
McCain is an accessible employer. If you require an accommodation throughout the recruitment process (including alternate formats of materials or accessible meeting rooms), please let us know and we will work with you to meet your needs.
Your privacy is important to us. By submitting personal data or information to us, you agree this will be handled in accordance with the Global Employee Privacy Policy
Job Family: Information Technology
Division: Global Digital Technology
Department: I and O Project Delivery
Location(s): IN - India : Haryana : Gurgaon
Company: McCain Foods(India) P Ltd
Date Posted
12/07/2024
Views
0
Similar Jobs
Senior Engineering Manager, Micros Foundations - Atlassian
Views in the last 30 days - 0
Atlassian is seeking a Senior Engineering Manager to lead a team of Backend Software Engineers The role involves guiding technical decisions prioritiz...
View DetailsSenior Frontend Engineer - Simply Business
Views in the last 30 days - 0
Simply Business is seeking a Senior Frontend Engineer to join their Front End Tooling team The role involves developing products using best practices ...
View DetailsSenior Professional Services Consultant - Cloudflare
Views in the last 30 days - 0
The role of a Professional Services Consultant for Application Security and Performance at Cloudflare involves providing advisory and handson keyboard...
View DetailsSenior Software Engineer (Scala/Java) - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking an experienced backend engineer with strong Java or Scala skills to join the Map Processing Pipelines team The role invol...
View DetailsSoftware Architecture Engineering and Cloud Computing Engineer - The Aerospace Corporation
Views in the last 30 days - 0
The Aerospace Corporation is seeking a Senior Project Engineer with expertise in software architecture engineering and cloud computing The role involv...
View DetailsPrincipal / Lead Software Engineer- RUST (Algorithmic and Mathematics) - m/w/d - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking a Principal Software Engineer to lead the development of extended services for their VRP solver Tour Planning The role in...
View Details