We are seeking a highly skilled Data Software Engineer to join our team and contribute to the development of a secure and innovative document flow solution hosted on AWS. As part of our mission, you will collaborate with a team of experienced professionals to evolve an end-to-end information lifecycle solution, leveraging modern technologies like AWS Glue, Athena, and Apache Spark. Your role will focus on ensuring the scalability, efficiency, and reliability of a cutting-edge system that simplifies digital document management for our global clientele. Responsibilities Design, develop, and implement data pipelines and workflows using AWS Glue and related technologies Build and optimize scalable and efficient data models using Athena and S3 to support reporting and analytics Develop and maintain ETL processes with tools like Apache Spark to process high-volume data workloads Collaborate with BI analysts and architects to enhance processes for Business Intelligence and analytics Optimize the cost and performance of cloud solutions by adopting fully managed AWS services Maintain and improve CI/CD pipelines to ensure seamless integration and deployment Monitor the solution for performance, reliability, and cost efficiency using modern observability tools Support the development of reporting dashboards by providing accurate and timely data models Deliver high-quality code while following best practices for testing and documentation Troubleshoot and resolve issues with data workflows, ensuring system uptime and reliability Requirements 2+ years of working experience in data engineering or software development with a strong focus on AWS services Proficiency in AWS Glue, Amazon Athena, and core Amazon Web Services tools like S3 and Lambda Expertise in Apache Spark, with a strong background in developing large-scale data processing systems Competency in BI process analysis, with the ability to work with analytics teams to optimize reporting workflows Familiarity with SQL, building complex queries for data extraction and transformation Understanding of data lake and ETL architecture concepts for scalable data storage and processing Knowledge of CI/CD pipelines and competency in integrating data workflows into deployment frameworks Flexibility to use additional tools such as Amazon Kinesis, Apache Hive, or Elastic Kubernetes Service Excellent communication skills in English, with a minimum proficiency level of B2 Nice to have Experience with Amazon Elastic Kubernetes Service (EKS) for containerized application orchestration Familiarity with Amazon Kinesis for real-time data streaming and event processing Understanding of Apache Hive and its applications in data warehousing Background in BI toolset operations, improving Business Intelligence platform efficiencies Proficiency in Java or Node.js for extending data processing capabilities We offer/Benefits - International projects with top brands - Work with global teams of highly skilled, diverse peers - Healthcare benefits - Employee financial programs - Paid time off and sick leave - Upskilling, reskilling and certification courses - Unlimited access to the LinkedIn Learning library and 22,000+ courses - Global career opportunities - Volunteer and community involvement opportunities - EPAM Employee Groups - Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn