Site Reliability Operations Engineer Iii

hace 2 semanas


San José, San José, Costa Rica Zuora A tiempo completo

Company Overview
At Zuora, we do Modern Business.

We're helping people subscribe to new ways of doing business that are better for people, companies and ultimately the planet.

It's an approach resulting from the shift to the Subscription Economy that puts customers first by building recurring relationships instead of one-time product sales and focuses on sustainable growth.

Through our leading expertise and multi-product suite, we are transforming all industries and working with the world's most innovative companies to monetize new business models, nurture subscriber relationships and optimize their digital experiences.


THE TEAM

Responsible For:

  • Service Operations & Impacting issue Restoration
  • Driving Command Center Incident Bridges for customer issues to resolution
  • Responding to Observability Alerts/Alarms
  • Responding to escalated issues from Customer support
  • Write & Automate runbooks and drive alerts/incidents and service requests reduction by automation
  • Being a liaison for a service and partner with service owner to make the service rock solid and efficient

WHAT YOU'LL ACHIEVE


As a SRO, you will be a member of a team that understands the configuration, technical dependencies, and overall behavioral characteristics of production services.

In partnership with developers, you have the responsibility to ensure services are designed and delivered with focus on security, resiliency, scale, and performance.

SROs are the ultimate authority and are accountable for end-to-end performance and operability of the services they own.

Champion service reliability operations and incidents prevention

  • You will be part of the team whose mission is the shared ownership of a collection of services and technology areas, in partnership with developer teams.
  • You are a key escalation point for issues that have been documented as Standard Operating Procedures (SOPs) or issues that needed indepth troubleshooting and analysis. You will help maintain uptodate documentation on deployments, processes and SOP runbooks.
  • You are a key escalation point in leading incidents and working with Subject Matter Expert (SME) for performing realtime incident handling tasks to support operations. You will help develop and implement the incident management process.
  • You will have the deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Once you have expertly mitigated an incident, you will immediately work with SME on how to more quickly resolve the issue next time, with the goal to prevent the problem from recurring. You will help develop and implement the problem management process.
  • You will manage the full lifecycle of infrastructure and change management, including planned maintenance, standart, normal, and emergency changes. You will help develop and implement change management processes to ensure developers and SRO can easily manage system configurations, deploy new code quickly and fix incidents faster.
Service design and implementation

  • You will partner with development SCRUM teams in defining and implementing improvements to service architecture, both current and future. You will be an expert at articulating technical characteristics of services and their dependencies, and guide development teams to engineer highly reliable and performant services.
  • You will frequently partner with developer SCRUM teams and actively participate in the execution of tasks required to meet milestones and deliverables set by the team throughout a release cycle.
Operations Engineering

  • You will take part in a shared oncall rotation that won't cripple your life or kill your soul.

Job Involves:

  • Resolution of complex and critical issues, participation in Major incidents as a SME
  • Service expert ensuring expertise is reflected in SOP's documentation are shared
  • Instrumentation and metrics that clearly describe the service behaviors
  • Scaling requirements and patterns
  • Resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
  • Driving and escalating gaps in automation, solutions and documentation

WHAT YOU'LL NEED TO BE SUCCESSFUL


SROs are a rare mix of sysadmins and development engineers, and as such you have the ability to understand and explain the effect of product architecture decisions on the ability to run as distributed systems.

You are driven by professional curiosity and a desire to develop a deep understanding of the services and the technologies they depend upon.


You demonstrate competence in shell scripting and high-level programming languages such as Bash, Ansible, Python, Terraform and low-level / no-code programming languages and solutions such as Google Apps Scripts, Jenkins Pipelines Groovy scripts, Jira Automation, Rundeck.

You are proactive, self-motivated, customer-focused, organized, and a good communicator.

You have over 4 years experience r
  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica CRG Solutions A tiempo completo

    Reporting to the Director of Solutions Engineering, the Site Reliability Engineer provides technical andprocess guidance specific to a business unit. Key areas of impact this role provides are in depth knowledgeof the engineering environments within the specific business unit and providing automated, stable, andAutomation Solutions Engineering, CI/CD...

  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica Oracle A tiempo completo

    Site Reliability Engineer-230001K1Applicants are required to read, write, and speak the following languages: EnglishPreferred QualificationsSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle...

  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica Hitachi Solutions A tiempo completo

    Company DescriptionHitachi Solutions is a global Microsoft solutions integrator passionate about developing and delivering industry-focused solutions that support our clients to deliver on their business transformation goals. Our industry focus, expertise, and intellectual property is what truly sets us apart. We have earned, and continue to maintain, a...

  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica Hitachi Solutions Ltd A tiempo completo

    Company DescriptionHitachi Solutions is a global Microsoft solutions integrator passionate about developing and delivering industry-focused solutions that support our clients to deliver on their business transformation goals. Our industry focus, expertise, and intellectual property is what truly sets us apart. We have earned, and continue to maintain, a...

  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica Equifax A tiempo completo

    Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. _ SREs in our team...


  • San José, San José, Costa Rica OfficeSpace A tiempo completo

    OfficeSpace Software is the workplace management platform enabling the future of work, with software that helps teams plan, connect, and perform in the hybrid workplace. 1,000 of the world's top organizations use OfficeSpace to get the most out of their space and connect the people in it, with intuitive space planning, desk and room booking, employee...

  • Site Reliability Engineer

    hace 2 semanas


    San José, San José, Costa Rica Cohesity A tiempo completo

    Cohesity is on a mission to radically simplify how organizations secure and manage their data, while unlocking limitless value. As a leader in data security and management, we make it easy to secure, protect, manage and derive value from data—across the data center, edge, and cloud. At Cohesity, we're a group of builders and go-getters who are committed to...


  • San José, San José, Costa Rica Equifax A tiempo completo

    Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you. _What you'll do: You will influence and design the infrastructure, architecture, standards, and methods for largescale systems. Will support...


  • San José, San José, Costa Rica Datasite A tiempo completo

    Datasite is where deals are made. We provide the data rooms and SaaS technology used in M&A and other high-value transactions, to deliver projects in more than 170 countries. Carrying that success into the future is all about you. Your useful skills, your unusual experience, your unique ideas. Everyone here brings something unexpected. What's yours? Invest...


  • San José, San José, Costa Rica INTEL A tiempo completo

    Microelectronic Quality Reliability Engineers provide project management, product, process design/development and sustaining support for integrated circuit or semiconductor assemblies, various other electronic components, sub systems and/or completed units. Responsible for physical understanding, model prediction and enhancement of quality and reliability...

  • Mechanical Engineer Sme

    hace 2 semanas


    San José, San José, Costa Rica INTEL A tiempo completo

    The Mechanical Engineer SME provides facilities engineering leadership to the site organizations in support of safety, reliability, quality, environmental and cost objectives.As a Mechanical Engineer SME your responsibilities will include but are not limited to:Owning safety within area of influence; ensuring systems operate reliably to avoid...


  • San José, San José, Costa Rica Screenovate A tiempo completo

    Responsible for technical functions in support of engineering activities such as design, test, checkout, modifications, fabrications, of design circuitry, electromechanical systems, or specialized test equipment. Conducts engineering tests and detailed experimental testing to collect data or assist in research work. May perform operational test and fault...

  • Support Engineer Tier Ii

    hace 2 semanas


    San José, San José, Costa Rica Micro Focus A tiempo completo

    Micro Focus is one of the world's largest enterprise software providers. We deliver mission-critical technology and supporting services that help thousands of customers worldwide manage core IT elements of their business so they can run and transform—at the same time.Micro Focus is seeking a Change Guardian - Support Engineer Tier II to join its CyberRes...


  • San José, San José, Costa Rica GSB A tiempo completo

    An important company is looking for a GCP SREQualifications: Experience in working with GCP IAM, GCE, GKE, BigQuery, GCS, Monitoring etc Ability to write and debug Terraform scripts & provisioning of resources Experience in managing Kubernetes workloads using Helm charts Proactive approach to identifying problems, performance bottlenecks, and areas for...

  • Data Engineer

    hace 2 semanas


    San José, San José, Costa Rica Datasite A tiempo completo

    Job Description:Who We AreWe are looking for a Data Engineer who loves to work with data, develop data pipelines, make quality data available at scale and has experience sourcing and preparing data.Summary:You will be joining a team responsible for the sourcing and preparation of data for data science and AI/ML processes. As a Data Engineer, you will be...

  • Site Manager

    hace 2 semanas


    San José, San José, Costa Rica Cornerstone Building Brands A tiempo completo

    Job Description:The Site Manager will oversee the operations and budget of all services performed at Cornerstone's Costa Rica location in San Jose. Current services at our Costa Rica Location are heavily focused on Engineering and Drafting, but our future includes services for other business functions. Provide the mentoring, training and personnel needs of...


  • San José, San José, Costa Rica Intel A tiempo completo

    Job Description Join Intel as a Lab Compute Asset Management Dev/Ops engineer. The Lab Compute Asset Management team ensures the efficient planning, provisioning, installation/configuration, maintenance, and/or operations of the Lab Asset management software solutions. The primary responsibilities for this role will include, but are not limited to: ...


  • San José, San José, Costa Rica ATSG Corporation A tiempo completo

    INL Construction Engineer Specialist - Administrative Support IV (CARSI - Costa Rica)San Jose, Costa RicaATSG Corporation prides itself on our dedication to providing expert assistance to our government partners—without any surprises. We work hard to honor our commitment to our clients while ensuring our employees feel secure and empowered in their work....


  • San José, San José, Costa Rica Amazon A tiempo completo

    Are you passionate about managing the development, testing and support processes in a rapid software development life cycle? Automating DevOPs & Testing Tasks? Solving operational and reliability problems?Key job responsibilities Have the ability to learn technical concepts quickly with a strong sense of urgency Have enthusiasm for working in a fast paced,...


  • San José, San José, Costa Rica Amazon A tiempo completo

    Are you passionate about managing the development, testing and support processes in a rapid software development life cycle? Automating DevOPs & Testing Tasks? Solving operational and reliability problems?Key job responsibilities Have the ability to learn technical concepts quickly with a strong sense of urgency Have enthusiasm for working in a fast paced,...