Manager, Site Reliability Engineering

hace 1 semana


San Francisco, Heredia, Costa Rica Servicenow A tiempo completo

Job DescriptionPosition Overview:We are seeking a highly skilled Technical SRE Manager to lead our Site Reliability Engineering (SRE) team.
This role is pivotal in ensuring the scalability, availability, and reliability of our critical systems while driving automation, observability, and operational excellence.
You will build and lead a team of NextGenOps and SREs, collaborate with engineering and operations teams, and implement AI/ML-driven strategies to enhance predictive analytics, proactive issue resolution, and self-healing systems.This is a flexible work schedule position based in Costa Rica.The NextGenOps team is a forward-thinking, AI-powered Site Reliability Engineering (SRE) group at the forefront of revolutionizing how we approach operations and infrastructure.
Our team is dedicated to building resilient, scalable, and self-healing systems using cutting-edge AI/ML-driven technologies.
We are not just an operations team; we are engineers on a mission to push the boundaries of automation, observability, and operational excellence.
By combining AI with our deep expertise in cloud-native platforms, DevOps, and SRE best practices, we are shaping the future of how technology scales, evolves, and self-heals.
If you're passionate about innovation and making a real impact through intelligent, data-driven solutions, you'll thrive in our dynamic, collaborative, and engineering-centric culture. QualificationsKey Responsibilities:Lead and mentor a team of AI/ML-powered SREs, fostering a culture of automation, observability, and proactive issue resolution.Define and execute AI/ML-driven SRE strategies for incident prediction, anomaly detection, and root cause analysis.Champion AI-powered observability practices and advocate for self-healing architectures with machine learning automation.Develop and enforce SLOs, SLIs, and SLAs using AI-driven insights.Oversee AI-powered incident management, real-time anomaly detection, and auto-remediation.Drive AI-driven automation for issue resolution, anomaly detection, and system fine-tuning.Implement predictive maintenance and auto-remediation through machine learning models.Ensure reliable deployments with AI-assisted rollouts, blue-green deployments, and canary releases.Optimize costs through AI-powered resource allocation and workload balancing.Ensure security and compliance with AI-driven event detection and threat mitigation.Implement chaos engineering with AI-driven failure analysis to strengthen system resilience.Collaborate with security teams to enforce AI-assisted threat detection and automated compliance monitoringLead capacity planning and performance optimization using AI/ML for dynamic scaling and resource forecasting.Implement intelligent monitoring, logging, and alerting with AI-powered tools like Prometheus and Grafana.Optimize CI/CD pipelines with AI-driven risk assessments and automated rollbacks.To be successful in this role you have:Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving.
This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.10+ years in SRE, DevOps, or infrastructure engineering, including 3+ years in a leadership role.Proven experience integrating AI/ML for observability, automation, and incident response.In depth understanding of monitoring tools (LogicMonitor, Catchpoint, Redgate, ScienceLogic). Demonstrated expertise in implementing and optimizing OpenTelemetry (OTel) for comprehensive observability across endpoints, cloud environments, infrastructure, and SaaS applications, enabling proactive monitoring, tracing, and performance insights.Proficiency in scripting languages (Python, Go, Bash) and infrastructure tools (Terraform, Ansible) with AI/ML integration.In-depth knowledge of observability and data pipeline tools (Datadog, Prometheus, Splunk, AI-driven platforms like Cisco FSO).Extensive experience in incident management and on-call rotations, with AI-enhanced predictive approaches.Experience with CI/CD pipelines, GitOps, and infrastructure-as-code (IaC).Preferred Qualifications:Experience with data platforms or enterprise automation tools (e.g., ServiceNow, Salesforce, SAP).Knowledge of AI/ML-based data automation technologies.Familiarity with regulatory requirements for data privacy, such as GDPR and CCPA.A passion for leveraging emerging technologies to drive business transformation.A customer-first mentality with an ability to translate user feedback into actionable product features.Experience in leading cross-functional teams in a matrixed organization.Strong communication and leadership skills, with the ability to engage and influence stakeholders across technical and non-technical teams.Ability to thrive in a rapidly evolving industry and adapt to new challenges and opportunities.FD21Not sure if you meet every qualification?
We still encourage you to apply
We value inclusivity, welcoming candidates from diverse backgrounds, including non-traditional paths.
Unique experiences enrich our team, and the willingness to dream big makes you an exceptional candidateAdditional InformationWork PersonasWe approach our distributed world of work with flexibility and trust.
Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work. Learn more here.Equal Opportunity EmployerServiceNow is an equal opportunity employer.
All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law.
In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements. AccommodationsWe strive to create an accessible and inclusive experience for all candidates.
If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact ****** for assistance. Export Control RegulationsFor positions requiring access to controlled technology subject to export control regulations, including the U.S.
Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals.
All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities. From Fortune.
2024 Fortune Media IP Limited.
All rights reserved.
Used under license.



  • San Francisco, Heredia, Costa Rica Sysco Costa Rica A tiempo completo

    **Requirements**:- Develop and refine strategy and process for all support issue tracking from intake through resolution in conjunction with senior members of the team.- Contribute to, and occasionally lead, strategic discussions to continue the evolution of flexibility and sustainability of the entire product suite.- Partner with Level 1 support teams,...


  • San Francisco, Heredia, Costa Rica Vmware A tiempo completo

    **The Elevator Pitch**: Why will you enjoy this new opportunity?**Working with VMware SEBU/SASE team, you will be engaging in cutting edge technology to build and implementation SASE products and platforms.You will also be involved in automating, problem solving, troubleshooting of SASE full stack encompassing infrastructure, platform, enterprise management...


  • San Francisco, Heredia, Costa Rica Ibm A tiempo completo

    IntroductionAt IBM, work is more than a job - it's a calling: To build.To design.To code.To consult.To think along with clients and sell.To make markets.To invent.To collaborate.Not just to do something better, but to attempt things you've never thought possible.Are you ready to lead in this new era of technology and solve some of the world's most...

  • SRE Team Manager

    hace 22 horas


    San Francisco, Heredia, Costa Rica Sysco Costa Rica A tiempo completo

    **About the Role:**The SRE Team Manager is responsible for leading and managing a team of Site Reliability Engineers. The ideal candidate will have a strong background in technical operations, excellent leadership skills, and experience working in agile development environments.**Key Responsibilities:**Lead and manage a team of Site Reliability...


  • San Francisco, Heredia, Costa Rica Experian A tiempo completo

    Are you a skilled System Reliability Engineering Director looking for a new challenge? Look no further than Experian, the world's leading global information services company!We are seeking a highly experienced System Reliability Engineering Director to join our Enterprise Platforms & Applications DevOps CoE. As a key member of our team, you will play a...


  • San Francisco, Heredia, Costa Rica Te Connectivity A tiempo completo

    About Us:We are Te Connectivity, a leading global provider of connectivity and sensor solutions. Our team is passionate about delivering high-quality products that meet the needs of our customers. We are seeking a highly motivated and experienced Quality Engineering Manager to join our team.Key Responsibilities:Develop and implement quality control systems...


  • San Francisco, Heredia, Costa Rica Ibm A tiempo completo

    **Introduction**At IBM, work is more than a job - it's a calling: To build.To design.To code.To consult.To think along with clients and sell.To make markets.To invent.To collaborate.Not just to do something better, but to attempt things you've never thought possible.Are you ready to lead in this new era of technology and solve some of the world's most...


  • San Francisco, Heredia, Costa Rica Servicenow A tiempo completo

    About Our TechnologyWe are dedicated to building resilient, scalable, and self-healing systems using cutting-edge AI/ML-driven technologies. Our team has extensive experience in integrating AI/ML for observability, automation, and incident response.Key ResponsibilitiesLead and mentor a team of AI/ML-powered SREs, fostering a culture of automation,...


  • San Francisco, Heredia, Costa Rica Huber+Suhner A tiempo completo

    We're looking for a Product Reliability Engineer to join our team at HUBER+SUHNER! In this role, you'll be responsible for ensuring product reliability and developing new approaches to improve quality.ResponsibilitiesConduct root cause investigations and implement corrective actions.Evaluate current-state approaches and results.Develop and initiate standards...


  • San Francisco, Heredia, Costa Rica Palo Alto Networks A tiempo completo

    Company Description**Our Mission**At Palo Alto Networks, everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.We have the vision of a world where each day is safer and more secure than the one before.These aren't easy goals to accomplish - but we're not here for easy.We're here for...

  • Commissioning Manager

    hace 2 semanas


    San Francisco, Heredia, Costa Rica Amentum A tiempo completo

    **Amentum** is seeking a **Commissioning Manager** to join our Operations and Maintenance TeamThe **Commissioning Manager** is responsible for the organization, direction, supervision and coordination of commissioning activities on a project site as delineated in the applicable procedures and Site-Specific Safety Plan.**Essential Responsibilities**:-...


  • San Francisco, Heredia, Costa Rica Sysco Costa Rica A tiempo completo

    **Job Overview:**Sysco Costa Rica seeks a highly skilled Technical Operations Lead to join our team. As a key member of our SRE organization, you will be responsible for developing and refining strategies for support issue tracking, collaborating with cross-functional teams, and contributing to the evolution of our product suite.**Responsibilities:**Develop...


  • San Francisco, Heredia, Costa Rica Vmware A tiempo completo

    About the RoleThis position is an excellent opportunity for a motivated and experienced Site Reliability Engineer to join our VMware team in Costa Rica. We are looking for someone who is passionate about operations, service reliability, networking, or cloud systems and security.Key ResponsibilitiesAutomate day-one operations and deploy scalable solutions...

  • Engineering Manager

    hace 6 días


    San Francisco, Heredia, Costa Rica Tebra A tiempo completo

    **Job Overview**:Tebra is seeking an Engineering Manager to lead our software development team. As a key member of our engineering organization, you will be responsible for overseeing the development of cutting-edge software solutions that enable healthcare providers to effectively manage their businesses.The ideal candidate will have a strong technical...


  • San Francisco, Heredia, Costa Rica Te Connectivity A tiempo completo

    **About Te Connectivity:**We are a leading global manufacturer of connectivity solutions, offering a comprehensive portfolio of products and services for the industrial, automotive, aerospace and defense, healthcare, energy and consumer markets.Our Quality and Reliability Engineering Teams play a critical role in ensuring the quality and reliability of our...


  • San Francisco, Heredia, Costa Rica Te Connectivity A tiempo completo

    **Job Overview**We are seeking an experienced Engineering Operations Manager to join our team at TE Connectivity. As a key member of our manufacturing operations, you will play a critical role in ensuring the efficient and effective management of our production processes.**Responsibilities:**Lead a team of engineers and technicians in expanding the...


  • San Francisco, Heredia, Costa Rica Te Connectivity A tiempo completo

    TE Connectivity's Quality and Reliability Engineering Teams analyze the ability of product and production systems to comply with customer and contractual requirements through established reliability factors.They design, recommend revisions and install quality control systems, develop and document analytical methods for establishing reliability of products...


  • San Francisco, Heredia, Costa Rica Hewlett Packard Enterprise A tiempo completo

    We are looking for an experienced Engineering Portfolio Manager to join our team at Hewlett Packard Enterprise. As a seasoned professional, you will oversee complex engineering programs that drive innovation and growth. Your expertise will be crucial in shaping the Storage as a Service platform development model for HPE GreenLake Storage development in...


  • San Francisco, Heredia, Costa Rica Marriott International, Inc A tiempo completo

    **Job Overview**Marriott International, Inc. is seeking a highly skilled Engineering Operations Manager to lead its engineering and maintenance operations in Costa Rica.The ideal candidate will have extensive experience in managing engineering and maintenance teams, with a focus on safety, security, and asset protection.This role is responsible for...

  • Technical SRE Leader

    hace 1 semana


    San Francisco, Heredia, Costa Rica Servicenow A tiempo completo

    Job OverviewWe are seeking a highly skilled Technical SRE Manager to lead our Site Reliability Engineering (SRE) team. This role is pivotal in ensuring the scalability, availability, and reliability of our critical systems while driving automation, observability, and operational excellence.About UsOur NextGenOps team is a forward-thinking, AI-powered Site...