We provide best hadoop online training in India and our contents are designed by the experts. Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Hadoop Cluster- Single and Multi node, Hadoop 2.x, Flume, Sqoop, Map-Reduce, PIG, Hive, Hbase, Zookeeper, Oozie etc. will be covered in the course.

Entire world market using this..What is the speciality in it ?

  • Big data
  • What is Bigdata ?
  • What is Hadoop ?
  • What is Hadoop role in Bigdata ?
  • What made hadoop as a very hot technology in the market?
  • How Hadoop used every where in market…Facebook,google,Adobe….?
  • I am not a JAVA resource..Shall I become a hadoop expert ?

Lets start your first step:

Hadoop Architecutre

  • Cluster
  • Setup
  • HDFS
  • Configuration
  • Data Strucutres
  • Operations
  • Fault Torelance and consistency
  • NameNode
  • Configuration
  • Importance
  • Seconday NameNode
  • Configuration
  • DataNode
  • Configuration
  • Task Tracker
  • Configuration

How to set up my own cluster/hadoop environment ?

Hadoop Eco Systems(Version 1.*,2.*)

MapReduce
Hive
Pig
HCatalog

Explore the underlaying programming structure

Map Reduce

  • DataTypes
  • Key-Value
  • Writable
  • WritableComparable

How to read & write data ?

  • RecordReader
  • RecordWriter

FileFormats

  • TextInput & Output Format
  • SequenceFileInput & Output Format
  • DBInput & Output Format
  • KeyValueTextInput & Output Format

Phases

  • Mapper
  • Partitioner
  • Combiner
  • Shuffle and Sort
  • Reducer

Job Submission API

  • Configuration
  • Job
  • Driver
  • ToolRunner

Real Examples with Sample Data

  • Retail
  • Banking
  • Finance & Insurance
  • Telecom

MapReduce DesignPatterns

  • Importance

Types of Design Patterns with examples

  • Aggregation Patterns
  • Inverted Index Summarizations
  • Filter Patterns
  • Top Ten Examples
  • Distinct Examples
  • Data Organization Patterns
  • Structured to Hierarchical Examples
  • Partitioning Examples
  • Total Order Sorting Examples
  • Join Patterns
  • MapSide Jion Examples
  • ReduceSide Join Examples
  • Replicated Join Examples
  • FileType Patterns
  • External Source Output Example
  • External Source Input Example

Advanced MapReduce Programming

  • Distributed Cache
  • MultipleInputs API
  • MultipleOutputs API
  • Joins in MapReduce
  • Chained MapReduce Jobs
  • Customization of Key-Value
  • Secondary Sort
  • Customization of FileFormats with XML Type Processing
  • Customization of Combiner
  • Customization of Partitioner
  • Counters & Custom Counters
  • HDFS Operations From JAVA
  • MR Unit

Best Practices of MapReduce

  • Performance Tuning
  • Block Size
  • Compression
  • Speculative Execution
  • Sort & Shuffle
  • JVM Utilization

Hadoop Archiving and Compaction

Real time issues and solutions

  • Out of Memory
  • Long Running Jobs
  • Spill Issues
  • Reducer bytes size issues

Best ways to debug any real time issues

Hive

Hive Basics

  • Architecture
  • Driver
  • Compiler
  • Query Plan
  • QueryEngine
  • MetaStore
  • How Hive Query Converted to Map Reduce

What is HQL ?

Hive Set up

  • Hive Interfaces
  • Metastore Derby Vs MySQL
  • Hive CLI
  • Hive JDBC

Data Organization

  • DataTypes
  • Primitive
  • Complex
  • Tables – Managed Vs External
  • Partitions
    • Static Partitions
      Dynamic Partitions
  • Bucketing

Query Operations

  • Data Definition Language
  • Data Manipulation Language

Functions

  • Built in Functions
  • UDF
  • Generic UDF
  • UDAF
  • UDTF

QueryTool Configuration

Hive Advanced Programming

  • Views
  • Indexes
  • Serializers & Deserializers
  • Storage Formats
  • RC FileFormats
  • ORC FileFormats
  • Integration with HBase and Cassandra

Hive Administrative Operations

Best Practices of Hive

  • Do’s and Don’t
  • Performance Tuning
  • How to choose storage format type?
  • How to choose data organizations?
  • What type of compression need to be used ?
  • Group by queries issues
  • Spill parameters
  • Real time issues and solutions
  • Permission on tables
  • Effective data model techniques

Limitations of Hive

Best ways to debug any real time issues

Pig

  • Pig basic concepts
  • PIG Set up
  • PIG Built in Library functions
  • Which one to choose Pig or Hive ?
  • Pig scripts
  • Pig run modes
  • Pig as Java program
  • UDFs
  • Macros
  • How to debug PIG ?

HCatalog

  • Meta-data Processor
  • Meta-store conversion

SQOOP

  • SQOOP Architecture
  • SQOOP Configuration
  • Import from RDBMS
  • Export to RDBMS
  • Bulk Load
  • Incremental and Up-sert from and to RDBMS
  • Recovery Mechanism for Real time cases

HBase

HBase Basics

  • Architecture
  • Row Key
  • HMaster
  • Region Server
  • Zoo keeper Quorum
  • Write Ahead Log
  • Memstore
  • HFile Structure

Cluster Set up

Interfaces and Operations
CRUD Operations
Bulk loading
Locking
Filters
HBase Shell
Java Integration
REST Integration
Thrift & Avro Integration

HBase Design Aspects
How to choose row Keys ?
How to design column families?
How to compress the data ?
HBase Administrative Operations

  • HBase Advanced Programming
  • Compression Techniques
  • Compaction
  • Major Compaction
  • Minor Compaction
  • Garbage Collection
  • Map Reduce Integration
  • Hive Integration
  • Secondary Indexes
  • Co-Processors
  • Effective Connection Handling and HTable Pool
  • Counters
  • Block Cache Techniques
  • On heap cache
  • Off heap cache
  • HBase Performance Tuning

MongoDB

  • What is this ?
  • Mimic of My-SQL..Is it ?
  • Architecture
  • Mongod
  • Mongo
  • Client
  • Query Router
  • Configuration Servers
  • Shard
  • Collections
  • Documents

MongoDB Shard Cluster Set up

  • Schema Operations
  • CRUD Operations
  • Data organizations
  • Compression

Processing Audio & Video Files

Map Reduce Integration

Hive Integration

Real Time use cases

Performance Tuning Aspects

Which is the best Eco-system for my enterprise application ?