Sutherland is seeking Application and System Monitoring Engineer to take our existing CloudOps monitoring to the next level. In this position, you will be working with a multitude of modern tools and technologies to properly and efficiently build the next generation of monitoring systems as well as troubleshoot and resolve issues in our development, test, and production environments. The ideal candidate has to have the ability to work in a dynamic and complex software build environment and will also be an energetic self-starter with a passion to build, innovate, and achieve excellence. Subject Matter Expertise: Experience implementing predictive and detailed monitoring. Expert in Linux Command line. Design, architect, and implement secure and highly available monitoring infrastructure. Enhanced monitoring capabilities including: Auto detection of brute force attacks in logs. Password attacks in logs. Implement next-gen predictive monitoring solution to: Detect and alert on capacity utilization of compute resources. Detect and alert on any network-related issues and choke points. Ability to design, implement, and improve Grafana, Prometheus, Loki, Promtail, node exporter. Log parsing and management. Configuration of alerting, push notifications to VictorOps (now Splunk), and Email notifications. Architect, design, and implement Icinga 2 monitoring and alerting. Ability to monitor system metrics and log parsing. Ability to automate tasks using bash and/or Python scripting. Predictive monitoring of systems and applications. Familiarity with JVM internals and using JMX and REST for monitoring. Familiarity with AWS infrastructure. Deep understanding of Java applications, TLS, Apache. Automated checks of performance of system metrics in Grafana. Automated checks of performance of Web Applications. Problem-solving and troubleshooting, including performing root cause analysis to design preventative activities. Crafting and maintaining dashboards and reports, pulling together monitoring data across multiple platforms within the same tool as well as across multiple tools. Assisting with writing scripts and queries that can provide environment self-healing capabilities. Written, verbal, interpersonal, and presentation skills. Communications among technical and non-technical employees. A customer-driven approach and good customer management skills. Staying abreast of the latest monitoring technology and trends. Adhering to configuration, release, and change management protocols. Skill Sets and Qualifications: Bachelor's degree in Computer Science or equivalent experience. Experience with using monitoring tools in a production environment. 5+ years of production cloud operations experience. 5+ years expertise in Linux command line. 5+ years of using Terraform in AWS for automation. Hands-on with automation and seeking out opportunities to automate manual processes. 5+ years of strong, hands-on experience building production services in AWS. (Must Have) 4+ years of experience with scripting using Python and Bash. Ability to participate in on-call rotation. Considerable knowledge of IT equipment and diagnostic tools. Considerable knowledge of principles and techniques of systems analysis, design, development, and programming. Considerable knowledge of principles of information systems. Considerable knowledge of capabilities of computer technology. Knowledge of methods and procedures used to conduct detailed analysis and design of computer systems. Knowledge of practices and issues of systems’ security and disaster recovery. Knowledge of computer operating systems. Considerable problem-solving skills. Considerable logic and analytical skills. Considerable oral and written communication skills; interpersonal skills; considerable ability to analyze, troubleshoot, and resolve data communications problems. Considerable ability to prepare manuals, reports, documentation, and other written materials; considerable ability to identify, analyze, and resolve complex business and technical problems. Bonus Skills: Familiarity with Catchpoint. #J-18808-Ljbffr