
Application Reliability Engineer
- Maia, Porto
- Permanente
- Horário completo
- Ensure Application Stability – Monitor and maintain containerized applications, quickly responding to incidents and ensuring seamless recovery in production environments
- Investigate and Resolve Issues – Analyze performance and troubleshoot across technologies to diagnose problems and implement effective solutions
- Lead Incident Management – Prioritize and manage incidents from start to resolution, coordinating across teams to minimize downtime and maintain system reliability
- Champion Continuous Improvement – Proactively seek opportunities to enhance processes, implement best practices, and drive operational excellence in a dynamic tech environment
- Experience in administration and service recovery of containerized applications using Kubernetes, Openshift / Microshift
- Experience with SQL, .NET, Clickhouse, Kafka - although primary responsibility is not programming, interpret and troubleshoot and mitigate issues over existing code base is required
- Proficient in managing incidents and setting priorities effectively
- Excellent English skills – spoken and written
- Time management, organization and prioritization
- Attention to detail – thorough in work carried out
- Great interpersonal and communication skills
- Initiative – seeking continuous improvement and implementing best practice in technology environment
- Manufacturing industry business domain knowledge (e.g. Semiconductor, Electronics, Medical Devices or Industrial Equipment manufacturing)
- Experience with SQL Server
- Knowledge and/or Experience with the ITIL framework.
- Experience working with Manufacturing Execution Systems (MES) or production automation platforms
- Exposure to data platforms for operational reporting and analytics
- Understanding of IoT device integration and automation workflows within factory environments