Site Reliability Engineer (m/f)

More than a job, a mission! 

The Site Reliability Engineer (SRE) mission is to combine advanced Software Engineering practices with mature Operations skills to deliver and operate highly resilient systems at scale. He will have to follow blameless post-mortems practices so that all incidents are well understood, and problems are fixed at their root. Over time, he will make our systems more robust, fault-tolerant, and able to self-heal during the worst of outages and through the most unexpected circumstances. He must be able in troubleshooting complex problems and can dig very deep into why systems break in production. To do that, they rely on observability practices like centralized logging, distributed tracing, and anomaly detection. He will shorten detection (MTTD) and recovery times (MTTR), by improving the accuracy of alarms and speed of troubleshooting.


What are the responsabilities?

Ensure the availability, performance, scalability of the cloud infrastructure.

Perform blameless root cause analysis on outages and ensure action items are done.

Fix resiliency problems wherever they are in the product or collaborate with product teams to do it.

Monitor infrastructure, measuring availability and system health.

Collaborate with support teams in recovering from outages.

Troubleshoot complex incidents in highly distributed systems.

Shorten time to detecting by improving the accuracy of alarms.

Be a key stakeholder in the design of cloud services so that they are resilient from day 0.

Perform change management, monitoring, emergency response, and capacity planning.


What are we looking for?

Degree in Computer Engineering or similar education.

Excellent organizational skills and attention to detail.

Excellent communication skills.


Ability to coordinate multiple tasks, handle assigned action items, manage priorities, work effectively within deadlines and time pressures.

Should possess a problem-solving mentality; i.e., establish facts, leverage different sources of knowledge, not be afraid to fail, and seek new ways to improve.

Result driven mentality.

Strong organization and responsibility.

Passion for technology.

Experience with Cloud providers (AWS, Azure and GCP) 2-3 years.

Experience with OutSystems Platform is a plus.

Experience with monitoring and troubleshooting complex distributed systems.

Experience in designing resilient and fault-tolerant systems.

Experience in debugging complex, distributed systems.

Knowledge of collaborative platforms (Jira, Confluence and Git) is a plus.

Experience in ELK, Prometheus, Grafana (mandatory, 1 year).

Very good Portuguese and English spoken and written.

Excellent written/verbal communications skills and organizational skills.

Proficient in Microsoft Office Suite (Outlook, Word, Excel, PowerPoint, Visio).

Experience with SQL and NoSQL databases is a plus.

Experience with automation and IaC is a plus (Terraform, Ansible, etc.).

Experience with Docker and Kubernetes is a plus.


What are you going to find?

A great team and an excellent work environment! 

Sharing of the success and company results.

Well-being promotion, an openness and share culture and benefits and conditions specially thought for our employees. 

Training investment and skills acquisition along the career path, encouraging evolution in the company. With dedication and will, here you can go further!

Willingness to innovate and striving for excellence to satisfy our customers with products and services that make a difference.

Happiness and team spirit. We are a happy company according to the Happiness Works ranking published by Exame magazine.


If you think that you can make the difference in our company, send your application to the e-mail with the reference #SiteReliabilityEngineer.

A sua sessão irá encerrar
dentro de

{{ minutesToPresent }}:{{ secondsToPresent }}

Pretende continuar na sua área de cliente?

Show Timeout Session
Show Timeout Session

Com o Oney, os seus dados pessoais e privacidade estão sempre protegidos.

O Oney Bank – Sucursal em Portugal é o responsável pelo tratamento de todos os dados pessoais fornecidos nos formulários de candidatura espontânea e/ou oportunidades em aberto. Os dados pessoais são utilizados no âmbito do processo de candidatura, podendo igualmente ser considerados para envio de newsletter com novas oportunidades para a mesma área ou outras à qual se candidatou.

Caso não pretenda que os seus dados pessoais sejam considerados para novas oportunidades, envie-nos um email para
Consulte aqui toda a informação sobre o tratamento dos seus dados pessoais pelo Oney.