We are seeking a highly skilled Linux Site Reliability Engineer to join our Corporate Infrastructure Team. As our resident Linux SRE, you will play a key role in designing, implementing, and maintaining the Linux-based infrastructure that supports our organization’s operations. You will collaborate with a wider team of engineers and system administrators to ensure the availability, reliability, and performance of our systems. The ideal candidate will have a strong background in Linux administration, a good understanding of infrastructure architecture, and a passion for solving complex technical challenges.
Site Reliability Engineer – Linux
Linux Infrastructure Management:
- Work with the Architecture Team to translate company needs into infrastructure solutions that will suit those needs and requirements in terms of performance, resource usage, scalability, resilience and observability. The proposed solutions may include on premises virtualised/bare-metal, cloud or hybrid architectures and must ensure the use of Continuous Integration and Continuous Delivery, Infrastructure as Code and GitOps approaches.
- Monitor system performance, troubleshoot issues, and optimize Linux infrastructure for maximum efficiency and uptime.
- Implement and maintain system security measures, including user access controls, firewalls, and intrusion detection systems.
- Manage storage solutions and ensure effective data backup and disaster recovery processes.
- Automate repetitive tasks and streamline system administration workflows using scripting and configuration management tools (preferably Ansible playbooks).
- As we progress the development of the wider Infrastructure Team, participate in an on-call rota.
Infrastructure Design and Implementation:
- Collaborate with cross-functional teams to design and implement scalable and reliable Linux-based infrastructure solutions.
- Evaluate new technologies, tools, and frameworks to enhance system performance and efficiency.
- Plan and execute infrastructure projects, ensuring timely delivery and adherence to budget and quality standards.
- Participate in capacity planning and resource forecasting to support future growth and scalability.
Collaboration and Documentation:
- Work closely with other team members to resolve complex technical issues and provide support.
- Communicate effectively with stakeholders to gather requirements and provide regular updates on project progress.
- Document system configurations, procedures, and troubleshooting steps for reference and knowledge sharing.
- Contribute to the development of best practices and standards for Linux system administration.
- Extensive experience with Unix/Linux systems such as Canonical, Ubuntu and Redhat/CentOS in a large-scale environment, highly available distributed Linux set-up.
- Knowledge in the following scripting languages, python, bash, perl, ruby or similar.
- Knowledge and experience in the following monitoring and logging systems – Splunk / Elastic Stack (ELK) / Fluentd / Nagios / Zabbix.
- Knowledge and experience in using Terraform and writing plans to apply Infra as Code in on-prem and cloud infrastructure.
- Advanced knowledge of internet services and networking (such as DNS, email – postfix, HAProxy)
- Experience in Docker usage, writing custom Dockerfiles and managing services and pods in Kubernetes.
- Implement and maintain security best practices for containerized applications and Kubernetes clusters.
- Knowledge of Configuration as Code, with tools like Ansible, Puppet, Chef
- Enterprise level knowledge of virtualisation (VMware, KVM)
- Experience working under Agile frameworks and DevOps principles
- Experience in writing automation pipelines using tools like Argo, Workflow, GitHub or Actions
- Extremely organized with a strong attention to detail.
- Ability to work well under pressure.
- Demonstrated ability to manage multiple tasks and competing priorities.
- Great communication, interpersonal and teamwork skills
Experience in the following desired, but not essential:
- Understanding of Continuous Integration and Continuous Deployment tools like Jenkins, Bamboo or ArgoCD
- Experience and knowledge Windows systems administration and investigation, especially of event log and services.
- Experience working with SAFE or LESS.
Lunik employees enjoy:
- Hybrid working (3 days at home and 2 in the office)
- Corporate pension plan
- Free health insurance for the whole family
- Free English and Spanish language classes
- Free psychotherapy sessions
- Gym membership subsidy
- Free Life insurance
- Flexible retribution
- Fun socials – from weekly happy hour drinks to big seasonal events
Job Category: Corporate IT
Job Type: Full Time
Job Location: Madrid