We are looking for a Site Reliability Engineer experienced with Linux systems to join our expanding IT Operations team. This role involves a high level of collaboration with other technical staff.
Responsibilities
- Work with the Architecture Team and the Development Teams to translate company needs into infrastructure solutions that will suit those needs and requirements in terms of performance, resource usage, scalability, resilience and observability. The proposed solutions may include on premises virtualised/bare-metal, cloud or hybrid architectures and must ensure the use of Continuous Integration and Continuous Delivery, Infrastructure as Code and GitOps approaches.
- Invest time in developing and maintaining pipelines, scripts and playbooks to continuously reduce the human tasks required to operate the production services (toil).
- Collaborate with the Architecture Team and the Development Teams in projects for moving production services to cloud environments.
- Design and implementation of Public internet services, according to the needs of the organization.
- Troubleshoot internet services related issues.
- Provide comprehensive handover, top tier technical assistance and documentation to the operating and monitoring teams.
- Provisioning, operational tasks (performance, scaling, organization, routine patching, security…) and decommissioning of Linux servers.
- Management of infrastructure services such as email/SMTP, web, DNS, proxies, webservers and others.
- Participate in shared on-call rotation.
Requirements
- Strong experience using Infrastructure as Code (Terraform, Pulumi, CloudFormation, Google Deployment Manager, etc).
- Wide experience with Unix/Linux systems (Canonical Ubuntu and Redhat/CentOS Linux) in a large-scale operations, distributed Linux production set-up.
- Experience with third party public and hybrid cloud environments (Google Cloud Platform, OVH, Vultr, AWS, etc).
- Experience in centralized management systems (Ansible, Puppet, Canonical Landscape).
- Advanced knowledge of internet services and networking (DNS, email – postfix, HAProxy).
- Advanced knowledge with Web Servers (Apache, NGINX)
- Experience in Kubernetes administration. Experience in Kubernetes deployment is a plus. CKA certification is a plus.
- Knowledge on observability services and deployments such as OpenTelemetry/Prometheus
- Demonstrated ability to troubleshoot systems and network problem (for instance VPN site to site, IPSec, WireGuard)
- Experience in writing scripts for automating infrastructure tasks (Python, shell script…)
- Clued-up with web application frameworks for servers such as PHP/WordPress or Python/Django or NodeJS.
- Extremely organized with a strong attention to detail.
- Ability to work well under pressure
- Demonstrated ability to manage multiple tasks and competing priorities.
- Great communication, interpersonal and teamwork skills.
- Fluent in English.
Job Category: Infrastructure
Job Type: Full Time
Job Location: Malaga, Madrid