Sr. Data Engineer-ETL DeveloperPosted: 5 months ago
Job Title: Sr. Data Engineer-ETL Developer
Job Location: Charlotte, NC / Phoenix/Chandler, AZ / Minneapolis, MN
Duration: 6+ Months Contract
The Artificial Intelligence Technology Data Engineering Team is looking for a highly motivated and experienced Senior Data Engineer/ETL Developer. The right candidate will have expert level experience in supporting Big Data Platforms, products and data ingestion/provisioning activities from different data sources to and from the Enterprise Data Lake. As a Senior Data Engineer/ETL Developer, you will be working with Client's business & data science teams to get the business data requirements, perform data engineering/provisioning activities to support building, exploring, training and running Business models. The senior data engineer will use ETL tools like Informatica, Ab Initio, and data warehouse tools to deliver critical Artificial Intelligence Model Operationalization services to the Enterprise.
In this role you will be responsible for:
• Data modeling, coding, analytical modeling, root cause analysis, investigation, debugging, testing and collaboration with the business partners, product managers, architects & other engineering teams.
• Adopting and enforcing best practices related to data ingestion and extraction of data from the big data platform.
• Extract business data from multiple data sources and store in MapR DB HDFS location.
• Work with Data Scientists and build scripts to meet their data needs
• Work with Enterprise Data Lake team to maintain data and information security for all use cases
• Build automation script using AUTOSYS to automate the loads
• Design and develop scripts and configurations to successfully load data using Data Ingestion Frameworks or Ab initio
• Coordinate user access requests for data loaded in Data Lake
• Post-production support of the AIES Open Source Data Science (OSDS) Platform
• Supporting end-to-end Platform application delivery, including Infrastructure provisioning & automation and integration with Continuous Integration/Continuous Development (CI/CD) platforms, using existing and emerging technologies
• Provides design and development support for production enhancements, problem tickets and other issue resolution.
• Follows SDLC documentation needs for fixes to code
• Develops new documentation, departmental technical procedures and user guides
• Monitor production execution and respond to failures with processing
• Review code execution and recommend optimizations for production processes.
• Be willing to work non-standard hours to support production execution or issue resolution
• Be willing to be on-call/pager support for production escalation
• BS/BA degree
• 1+ year experience with Ab Initio suite of tools – GDE, Express>IT
• 3+ years experience with Big Data platforms – Hadoop, MapR, Hive, Parquet
• 5+ years of ETL (Extract, Transform, Load) Programming with tools including Informatica
• 2+ years of Unix or Linux systems with scripting experience in Shell, Perl or Python
• Experience with Advanced SQL preferably Teradata
• Strong Hadoop scripting skills to process petabytes of data
• Experience working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, HBase, Pig, Apache Spark, etc.)
• Possession of excellent analytical and problem-solving skills with high attention to detail and accuracy
• Demonstrated ability to transform business requirements to code, metadata specifications, specific analytical reports and tools
• Good verbal, written, and interpersonal communication skills
• Experience with SDLC (System Development Life Cycle) including understanding of project management methodologies used in Waterfall or Agile development projects
• MS/MA degree
• Experience with Java and Scala Development
• Experience with analytic databases, including Hive, Presto, and Impala
• Experience with multiple data modeling concepts, including XML and JSON
• Experience with loading and managing data using technologies such as Spark, Scala, NoSQL (MongoDB, Cassandra) and columnar MPP SQL stores (Redshift, Vertica)
• Experience with Change and Release Management Processes
• Experience with stream frameworks including Kafka, Spark Streaming, Storm or RabbitMQ
• Experience working with Cloud Architectures including Amazon Web Services (AWS) Cloud services: EC2, EMR, ECS, S3, SNS, SQS, Cloud Formation, Cloud watch