Hadoop DevOps-Engineer

  • Anywhere
  • Posted 1 year ago

The Hadoop DevOps-Engineer defines and maintains Installation, Administration and development by incorporating significant architectural best practices for current & future state Data Lake and Enterprise Data Hub to enable business success. Establish and formalize standards and guiding principles necessary to define and build business and Big data models to integrate with ETL and Application Interfaces along with data quality management solutions.
Able to construct and analyze both functional and business requirements and design solutions based on existing and evolving Hadoop Ecosystem solution tool sets.
Media & Entertainment experience is a plus. Hadoop experience is a must. Experience in Data Management, Teradata and Oracle technologies is an added skill set. Familiarity with Spark and Scala is required.

Responsibilities include:
As a Hadoop DevOps-Engineer is responsible for installing, configuring and the actual coding or programming of Hadoop applications. This role will include but no limited to Admin/Developer but also interacting with other tech groups and business to install/configure and develop application. Some of the Job responsibilities of an Hadoop DevOps-Engineer:
Job Responsibilities of a Hadoop DevOps-Engineer:-

  • Download, Deploy Hadoop Ecosystem software, configure and Administrate
  • Hadoop development and implementation.
  • Loading from disparate data sets, streaming, scrapping.
  • Pre-processing using Hive, Presto and Pig and other frameworks
  • Designing, building, installing, configuring and supporting Hadoop.
  • Translate complex functional and technical requirements into detailed design.
  • Perform analysis of vast data stores and uncover insights.
  • Maintain security and data privacy.
  • Create scalable and high-performance web services for data tracking.
  • Performance tune and design to high-speed querying.
  • Managing and deploying clusters.
  • Being a part of a POC effort to help build new Hadoop clusters.
  • Test prototypes and oversee handover to operational teams.
  • Propose best practices/standards.


Skills Required

  • Deep knowledge in Hadoop and its eco-system.
  • Good knowledge in back-end programming, Scala, java, JS, Node.js and OOAD
  • Writing high-performance, reliable and maintainable code. 
  • Ability to write hive, presto, spark jobs. 
  • Good knowledge of database structures, theories, principles, and practices. 
  • Ability to write Pig Latin scripts.
  • Hands on experience in HiveQL.
  • Familiarity with data loading tools like Flume, Sqoop.
  • Knowledge of workflow/schedulers like Oozie.
  • Analytical and problem solving skills
  • Proven understanding with Hadoop, HBase, Hive, Pig, and HBase.
  • Good aptitude in multi-threading and concurrency concepts.

Key outputs include:

• Hadoop Cluster setup and Administration and Maintenance

• Data integration solutions from oracle, Teradata, SQl Server into Hadoop

• BI Analytical tools like Tableau, Business Objects to Hadoop.

• Recommend best practices to manage Data Quality and Data integrity

• Contribution of documentation to IT knowledge database

Core Responsibilities:

25%  1. Work with Business Users and IT team to gather the business requirements in BI space and convert these into functional and technical requirements

25%  2. Data profiling, building conceptual and logic models, delivering effective and efficient physical models

20%  3. BI Strategy around data acquisition for integrating transactional systems or syndicate/industry data into Enterprise Data Lake

20%  4. Maintain the Business Glossary, Data Lineage and a governance process for Data Quality Management

5%  5. Assist reporting teams and Business Users in designing and building reporting solutions
5%  6. Evaluation of the latest BI tool set that fits or enhances the existing SPE BI tool stack.


• Hands on with installation, developing Hadoop Applications and Administration. Media & Entertainment experience is an added plus. Spark working knowledge is an added plus
•Teradata or BI related Certification is required.

• Data Warehouse database platforms – Teradata is a plus and any experience with EDW platforms like Exadata, Client blade or Netezza or Greenplum is and added advantage.
Experience with data profiling tool Ataccama and Trifacta
Experience with a variety of data ingestion tool Apache NiFi, Sqoop and Flume
Must be Certified Google Cloud Data Engineer
• Experience with Business Objects/Tableau and any BI Reporting tools is preferred
• Knowledge of BI tools like PowerBI is a plus and any familiarity with Tools that interact and interface with Big Data is a plus; such as Alteryx

Must have hands-on experience with Hybrid Mobile Application Angular, Ionic and Cordova

•  Familiar with any major ETL tools is required; DataStage / Informatica, Talend
• Experience with any Data Quality Management full life cycle is an added advantage.

• In depth hands on experience with SQL programming skills

•  Excellent Analytical skills

• Excellent communication skills with both business and technical customers

• Demonstrate a high level of integrity and maturity

• Take a proactive approach to cross-functional communication

• Actively seek out feedback from management and peers, to improve own performance based on that feedback