Hadoop Developer

Big Data Hadoop Developer: A Hadoop developer is responsible for the actual coding/programming of Hadoop applications. This role is synonymous with software developer or application developer; refers to the same role but in the Big Data domain. One component of Hadoop is MapReduce where you need to write Java programs.

Understanding Big Data and Hadoop

1 Introduction to Big Data & Big Data Challenges

  • Limitations & Solutions of Big Data
  • Architecture
  • Hadoop & its Features
  • Hadoop Ecosystem
  • Hadoop 2.x Core Components
  • Hadoop Storage: HDFS (Hadoop Distributed File System)
  • Hadoop Processing: MapReduce Framework
  • Different Hadoop Distributions

2 Hadoop Architecture and HDFS

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability Architecture
  • Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single Node Cluster & Multi-Node Cluster set up
  • Basic Hadoop Administration

3 Hadoop MapReduce Framework

  • YARN Components
  • YARN Architecture
  • YARN MapReduce Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Input Splits, Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Demo of Health Care Dataset
  • Demo of Weather Dataset

4 Apache Pig

  • MapReduce vs Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Pig Latin Programs
  • Shell and Utility Commands
  • Pig UDF & Pig Streaming
  • Testing Pig scripts with Punit
  • Aviation use-case in PIG
  • Pig Demo of Healthcare Dataset

5 Apache Hive

  • Hive vs Pig
  • Hive Architecture and Components
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive Partition
  • Hive Bucketing
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data & Managing Outputs
  • Hive Script & Hive UDF
  • Retail use case in Hive
  • Hive Demo on Healthcare Dataset

6 Advanced Apache Hive and HBase

  • Custom MapReduce Scripts
  • Hive Indexes and views
  • Hive Query Optimizers
  • Hive Thrift Server
  • Hive UDF
  • Apache HBase: Introduction to NoSQL Databases and HBase
  • HBase v/s RDBMS
  • HBase Components
  • HBase Architecture
  • HBase Run Modes
  • HBase Configuration
  • HBase Cluster Deployment

7 Advanced Apache HBase

  • HBase Shell
  • HBase Client API
  • Hive Data Loading Techniques
  • Apache Zookeeper Introduction
  • ZooKeeper Data Model
  • Zookeeper Service
  • HBase Bulk Loading
  • Getting and Inserting Data
  • HBase Filters

8 Processing Distributed Data with Apache Spark

  • Spark Ecosystem
  • Spark Components
  • What is Scala
  • Why Scala
  • SparkContext
  • Spark RDD
CONTACT US