Data EngineerPosted: 7 months ago
Design and Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
Data Analysis and Data exploration
Explore new data sources and data from new domains
Productionalize real time/Batch Client models on Python/Spark
Evaluate big data technologies and prototype solutions to improve our data processing architecture.
8+ years of hands-on programming experience with 4+ years in Hadoop platform
Proficiency in data analysis and strong SQL skills.
Knowledge of various components of Hadoop ecosystem and experience in applying them to practical problems Hive/Impala/Spark Scala/MR.
Proficiency with shell scripting & Python
Experience in data warehousing, ETL tools , MPP database systems
Experience working in HIVE & Impala & creating custom UDFs and custom input/output formats /serdes
Ability to acquire, compute, store and process various types of datasets in Hadoop platform
Understanding of various Visualization platforms (Tableau)
Experience with Scala.
Excellent written and verbal communication skills
Top skill sets / technologies:
Hive/(Talend or Pentaho or Informatica or similar ETL) /Impala/MapReduce/Spark
Scala / Python / Java
ETL /Data warehousing