Banner Default Image

Site Reliability Engineer

Vissza a kereséshez

Site Reliability Engineer

Tasks:
  • identify and recommend solution for possible bottlenecks related to AWS infrastructure and application level business logic
  • support the 24/7 Support Engineer team with documenting recovery schemes for different components
  • provide reports after incidents detailing root cause analysis and proposed preventative measurements
  • maintain our partner's disaster recovery plan
  • contribute to internal load and stress testing tool development
  • improve system alerts and monitoring to achieve proactive incident management
  • provide technical guidance and educate team members on development and operations best practices
Requirements:
  • 3+ years of experience as SRE, DevOps Engineer or similar software engineering role
  • strong ability to troubleshoot complex issues related to system resources or different application components
  • experience with high performance, scalable, multi-region AWS infrastructure
  • experience with managing Linux based environments
  • deep understanding of the full web technology stack and security best practices
  • coding ability in scripting and application programming languages
  • ability and willingness to learn new technologies
  • fluent English
  • experience in followings is a huge advantage:
  • developing and managing Java or Python applications
  • experience with different monitoring solutions (Grafana, Zabbix, Kibana; ElasticSearch, InfluxDB)
  • managing large dataset in PostgreSQL database
  • serverless infrastructure