OPS Engineer

Career - Full Time Open position Waylay Icon
Full Time
Location Marker - Waylay Careers
Ghent, Belgium

Waylay provides enterprise IT-OT digital unification software delivering low-code-based orchestration, automation, and analytics software solutions. Waylay is rapidly emerging as one of the most successful PaaS Platform Automation leaders globally. The biggest names are adopting Waylay in the US in growing numbers. We have a passion for supporting enterprises and allowing new ways to put their valuable business data to work. We enable domain experts and data scientists to experiment with new business flows, avoiding long software development cycles. In short, we connect IoT solutions to IT systems, empowering customers to build new applications faster and better than ever before. Waylay builds the tools to truly accelerate digital transformation projects.

Job description

To ensure the uptime of mission-critical business processes of our customers on our platform, we are looking for a hands-on OPS Engineer to join our team. The OPS engineer bears primary responsibility for the the stability of applications/ infrastructure stacks, identifies application and system-related problems and takes the lead in resolving incidents. Your tasks consist of monitoring, management, studying, testing and running the application / service. You proactively monitor the technical behaviour and performance in all aspects for the entire stack in order to prevent problems. Whenever possible you contribute to the process of constant improvements of our setups.

Responsibilities

  • You are responsible for the Waylay Platform reliability which is spread over multiple regions across the world
  • You understand and monitor the entire stack’s technology on which the application runs and how it fits in the overall chain
  • You configure and implement Monitoring Tooling using the correct Event Alerts
  • You specify, design and conduct Acceptance tests (including High Availability and Disaster Recovery Tests) with support from the dev team
  • You will identify key indicators to track reliability requirements and set clear and measurable Service Level Objectives (SLO’s)
  • You work within a global team that provides 24x7 level 3 support in deploying and running application/stack in production
  • You contribute to Incident, Problem and Change Management, including writing post-mortems for resolved problems
  • You build, enhance and maintain tooling and scripts to automate repetitive or error prone tasks
  • You gather, update and spread knowledge about developments and challenges regarding your field and embed lessons learned and best practices.
  • You report to the Services and Support Lead

Skills & Experience

  • Bachelor’s degree in engineering and/or technology, or an equivalent combination of education and experience.
  • Hands-on experience with containerized environments
  • Hands-on experience using an infrastructure as a service provider such as GCP, AWS or Azure
  • Hands-on experience of networking principles such as routing, load-balancing, TCP/IP stack
  • Hands-on experience with operating a container orchestration system such as Kubernetes  
  • Experience with performance troubleshooting, load testing and observability is a plus
  • Keywords that describe you are driven, ambitious and energetic
  • You’re an excellent problem solver under any kind of circumstances.
  • You’re customer oriented and an excellent communicator in English
  • Pressure doesn’t get to you; you remain calm, even during the most testing times
  • You value teamwork and accountability
Apply now