Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems.
What would you work on?
With us, you would be working on:
- distributed systems handling various web apps, services, and IoT
- an in-house service we developed from scratch for the last 5 years - a distributed high-load system spreading over 4 continents
- optimizing and rebuilding high traffic infrastructures for our clients
- developing backend APIs, real-time tracking services, etc.
The work would include:
- management, deployments, monitoring, backups, performance tuning, troubleshooting, etc.
- Kubernetes, containers (docker, rkt), automation tools (Ansible, SaltStack), cloud providers
- Go, C/C++, Python
What should you know?
- knowledge of GNU/Linux
- basic networking knowledge
- basic operating systems and computer architecture knowledge
- basic programming skills (preferably Python or Go)
- versioning tools (Git)
- basic SQL database experience (Postgres, MySQL)