About the Job AgileEngine is an award-winning software development company that creates innovative solutions for Fortune 500 brands and trailblazing startups across various industries. We prioritize a people-first culture, earning us multiple Best Place to Work awards. We're seeking a skilled Site Reliability Engineer to join our team. As a key member of our DevOps team, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure. Key Responsibilities: - Manage alerts daily, check systems, and escalate issues as needed. - Be part of a team that provides 24×7 on-call support for critical SaaS events. - Document issues and remediation steps. - Proactively create appropriate monitors in the EKS/K8S ecosystem. - Deploy to EKS/K8s cluster using Terraform and Helm. - Learn and maintain existing infrastructure running under Docker Swarm. - Improve existing infrastructure health by implementing checks and scripts to correct known issues. - Maintain and develop deployment code. - Automate manual tasks. - Implement/integrate new technologies in our Cloud Infrastructure. - Collaborate with other teams and departments to provide the highest level of support and assistance. - Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes. Requirements: - 2+ years of professional experience. - Experience working with Datadog. - Hands-on experience as an AWS Cloud Engineer. - Working knowledge of EKS/Terraform/Helm. - Working Experience with Docker and Docker Swarm. - Good understanding of AWS IAM roles and policies. - Experience logging and monitoring AWS resources using CloudWatch logs. - Experience working in a Linux environment. - Proficient in Bash and/or Python scripting. - A strong understanding of web technologies such as REST APIs. - Working Experience with monitoring solutions, such as Grafana and Prometheus. - Excellent oral and written communication skills. - Customer-facing communication skills to effectively explain issues and RCAs to customers. - Experience in Product/Application Support for SaaS-based products. - Understanding of APIs, Databases, Systems Architecture, and Design. - Designing, implementing, and operating in a DevSecOps environment. - Excellent communication skills, both written and verbal. - Ability to work independently as well as within a collaborative environment. - A technical aptitude with the desire to learn new and evolving technologies. - Upper-Intermediate English level. Benefits: - Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps. - Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities. - A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands. - Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.