Hadoop Training In Chennai

Vikapri course png
About Hadoop

HADOOP is an open-source Software Framework that provides assessment to data from a single server to a cluster commodity hardware. This process includes the storage and processing of a large amount of data.

Components of this module contain Big Data, Yet Another Resource Negotiator (YARN), Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce. Some of the applications used in HADOOP are Hive, HBase, Presto, Spark, Zeppelin.

We provide the best professional Hadoop Training In Chennai, in which you get advanced training sessions and hands-on projects by Experts.

Course Contents - Hadoop

Big Data, Hadoop, Introduction To Hadoop Architecture And HDFS
  • Rise of Big Data
  • Compare Hadoop vs traditonal systems
  • Core components of Hadoop
  • Hadoop Master-Slave Architecture
  • Understanding HDFS Architecture
  • NameNode, DataNode, Secondary Node
  • Learn about JobTracker, TaskTracker

Installing And Setting Up A Hadoop Cluster
  • Hadoop deployment Modes - Standalone, Single node, Multinode
  • Configuration files in a Hadoop Cluster
  • Important Web URL’s for Hadoop
  • Manual for installation of Hadoop
  • Manual for Demo VM installation
  • Manual for Multinode Hadoop Cluster installation on AWS

Understanding Hadoop MapReduce Framework
  • Overview of the MapReduce Framework
  • Use cases of MapReduce
  • MapReduce Architecture
  • Concept of Mappers, Reducers
  • Anatomy of MapReduce Program
  • Mapper/Reducer Class, Driver code
  • Understand Combiner and Partitioner

Advance MapReduce - Part 1
  • Write your own Partitioner
  • Writing Map and Reduce in Python
  • Map Side Join
  • Distributed Join
  • Distributed Cache
  • Reduce Side Join
  • Counters
  • Joining Multiple datasets in MapReduce

Advance MapReduce - Part 2
  • MapReduce internals
  • Understanding Input Format
  • Custom Input Format
  • MapReduce API
  • Hadoop Data Types
  • Using Writable and Writable comparable
  • Understanding Output Format
  • Sequence Files
  • JUnit and MRUnit Testing Frameworks

Apache Pig
  • PIG vs MapReduce
  • PIG components
  • PIG execution
  • PIG Data types
  • PIG Architecture
  • PIG Latin Relational Operators
  • PIG Latin Join and CoGroup
  • PIG Latin Group and Union
  • Describe, Explain, Illustrate
  • PIG Latin: File Loaders
  • PIG Latin: Creating UDF

Apache Hive And HiveQL
  • What is Hive
  • Hive DDL - Create/Show/Drop Database
  • Hive DDL - Create/Show/Drop Tables
  • Hive DML - Load Files into Tables
  • Hive DML - Inserting Data into Tables
  • Hive SQL - Select, Filter, Join, Group By
  • Hive Architecture & Components
  • Hive Data Model and Data Units
  • Difference between Hive and RDBMS

Advance HiveQL
  • Multi-Table Inserts
  • Joins
  • Grouping Sets, Cubes, Rollups
  • Custom Map and Reduce scripts
  • Hive SerDe
  • Hive UDF
  • Hive UDAF

Apache Flume, Apache Sqoop, Apache Oozie
  • Sqoop - How Sqoop works
  • Import/Export Data
  • Sqoop Architecture
  • Flume - How it works
  • Flume Complex Flow - Calculation/ Multiplexing
  • Oozie - Simple/Complex Flow
  • Oozie - Components
  • Oozie Service/ Scheduler
  • Example Workflow
  • Use Cases - Time and Data triggers
  • Running/Debuggin a Coordinator Job
  • Bundle

NoSQL Databases
  • Introduction to NoSQL
  • CAP theorem
  • RDBMS vs NoSQL
  • Analytical (OLAP)
  • Key Value stores: Memcached, Riak
  • Key Value stores: Redis, Dynamo DB
  • Column Family: Cassandra, HBase
  • Graph Store: Neo4J
  • Document Store: MarkLogic,MongoDB
  • Document Store: CouchBase,CouchDB,Exist DB

Apache HBase
  • When/Why to use HBase
  • HBase Architecture/Storage
  • HBase Features
  • HBase Data Model
  • HBase Families
  • Terms and Daemons
  • HBase Master
  • HBase vs RDBMS
  • Column Families
  • Access HBase Data
  • HBase API
  • Runtime modes
  • Running HBase

Apache Zookeeper
  • What is Zookeeper
  • Who is using it
  • Zookeeper Data Model
  • ZNode versions
  • Zookeeper API
  • ZNokde Types
  • Sequential ZNodes
  • Security
  • Standalone/Clustered mode
  • Installing and Configuring
  • Running Zookeeper
  • Zookeeper use cases

Hadoop 2.0, YARN, MRv2
  • Hadoop 1.0 Limitations
  • MapReduce Limitations
  • History of Hadoop 2.0
  • HDFS 2: Architecture
  • HDFS 2: Quorum based storage
  • HDFS 2: High availability
  • HDFS 2: Federation
  • YARN Architecture
  • Classic vs YARN
  • YARN Apps
  • YARN multitenancy