Hadoop Admin Online Training

Unit 1: Introduction to HDP and Hadoop 2.0                                                                      

  • Enterprise Data Trends @ Scale
  • What is Big Data’
  • A Market for Big Data
  • Most Common New Types of Data
  • Moving from Causation to Correlation
  • WhatisHadoop’
  • What is Hadoop 2.0′
  • Traditional Systems vs. Hadoop
  • Overview of a Hadoop Cluster
  • Who is Hortonworks’
  • The Hortonworks Data Platform
  • Use Case: EDW before Hadoop
  • Banking Use Case: EDW with HDP

Lab 1.1: Login to Your Cluster

Unit 2: HDFS Architecture                                                                                                                                           

  • What is a File System’
  • OS Architecture
  • HDFS Architecture
  • Understanding Block Storage
  • Demonstration: Understanding Block Storage
  • The NameNode
  • The DataNodes
  • DataNode Failure
  • HDFS Clients

Unit 3: Installation Prerequisites and Planning                                                 

  • Minimum Hardware Requirements
  • Minimum Software Requirements
  • A Formidable Starter Cluster

Lab 3.1: Setting up the Environment

Lab 3.2: Install HDP 2.0 Cluster using Ambari

Unit 4: Configuring Hadoop                                                                                                                                       

  • Configuration Considerations
  • Deployment Layout
  • Configuring HDFS
  • What is Ambari
  • Configuration via Ambari
  • Management
  • Monitoring

Lab 4.2: Stopping and Starting HDP Services

Lab 4.3: Using HDFS Commands

Unit 5: Ensuring Data Integrity                                                                                                                  

  • Ensuring Data Integrity
  • Replication Placement
  • Data Integrity – Writing Data
  • Data Integrity – Reading Data
  • Data Integrity – Block Scanning
  • Running a File System Check
  • What Does the File System Check Look For’
  • hdfs fsck Syntax
  • Data Integrity – File System Check: Commands & Output
  • The dfs Command
  • NameNode Information
  • Changing the Replication Factor

Lab 5.1: Verify Data with Block Scanner and fsck

Unit 6: YARN Architecture and MapReduce                                                                                        

  • What is YARN’
  • Hadoop as Next-Gen Plafform
  • Beyond MapReduce
  • YARN Use-case
  • YARN Bird’s Eye View
  • Lifecycle of a YARN Application
  • ResourceManager
  • NodeManager
  • MapReduce
  • Demonstration: Understanding MapReduce
  • Configuring YARN
  • Configuring MapReduce
  • Tools

Unit 7: Job Schedulers                                                                                                                                                          

  • Overview of Job Scheduling
  • The Built-in Schedulers
  • Overview of the Capacity Scheduler
  • Configuring the Capacity Scheduler
  • Defining Queues
  • Configuring Capacity Limits
  • Configuring User Limits
  • Configuring Permissions
  • Overview of the Fair Scheduler
  • Configuration of the Fair Scheduler

Lab 8.1: Configuring the Capacity Scheduler

Unit 8: Enterprise Data Movement                                                                                                         

  • Enterprise Data Movement
  • Challenges with a Traditional ETL Plafform
  • Hadoop Based ETL Plafform
  • Data Ingestion
  • Hadoop: Data Movement
  • Defining Data Layers
  • Distributed Copy (distcp) Command
  • Distcp Options
  • Using distcp
  • Using distcp for Backups
  • Lab 9.1: Use distcp to Copy Data from a Remote Cluster

Unit 9: Hive Administration                                                                                                                       

  • Introduction to Hive
  • Comparing Hive with RDBMS
  • Hive MetaStore
  • HiveServer2
  • Hive Command Line Interface
  • Processing Hive SQL Statements
  • Hive Data Hierarchical Structures
  • Hive Tables
  • Defining a Hive-Managed Table
  • Defining an External Table
  • Defining a Table LOCATION
  • Loading Data into Hive
  • Performing Queries
  • Guidelines for Architecting Hive Data
  • Hive Query Optimizations

Lab 10: Understanding Hive Tables

Unit 10: Transferring Data with Sqoop                                                                                  

  • Overview of Sqoop
  • The Sqoop Import Tool
  • Importing a Table
  • Importing Specific Columns
  • Importing from a Query
  • The Sqoop Export Tool
  • Exporting to a Table

Lab 11: Using Sqoop

Unit 11 Monitoring HDP2 Services                                                                                                          

  • Ambari
  • Monitoring Architecture
  • Monitoring HDP2 Clusters
  • Ambari Web Interface
  • Ambari Web Interface (cont.)
  • Ganglia
  • Ganglia Monitoring a Hadoop Cluster
  • Nagios
  • Nagios UI

Unit 12: Commissioning and Decommissioning Nodes                                  

  • Architectural Review
  • Decommissioning and Commissioning Nodes
  • Decommissioning Nodes
  • Steps for Decommissioning a DataNode
  • Decommissioning Node States
  • Steps for Commissioning a Node
  • Balancer
  • Balancer Threshold Setting
  • Configuring Balancer Bandwidth

Unit 13: Backup and Recovery                                                                                                                   

  • What should you backup
  • HDFS Snapshot
  • HDFS Data – Backup

Unit 14: Rack Awareness and Topology                                                                                

  • Rack Awareness
  • YARN Rack Awareness
  • Replica Placement
  • Rack Topology
  • Rack Topology Script
  • Configuring the Rack Topology Script

Unit 16: NameNode HA                                                                                                                                                               

  • NameNode Architecture HD
  • NameNode High Availability
  • HDFS HA Component
  • Understanding NameNode
  • NameNodes in
  • Failover Modes
  • hdfs haadmin Command
  • Red Hat HA
  • VMware HA