Description

IBM Open Platform (IOP) with Apache Hadoop is the first premiere collaborative platform, which enables Big Data solutions to be developed on the common set of Apache Hadoop technologies. The Open Data Platform initiative (ODP) is a shared industry effort, dedicated for the promotion and advancement of Apache Hadoop along with the Big Data technologies for the enterprise.

The current ecosystem has been challenged and slowed down by duplicated and fragmented efforts between the different groups. The ODP Core would take the guesswork out of the procedure, accelerating many used cases by running on a common platform. It allows enterprises to emphasize on the building of business driven applications.

This training program provides a detailed introduction to the main components of the ODP core –namely Apache Hadoop (including YARN, HDFS, and MapReduce) and Apache Ambari – in addition to providing a treatment of the main open-source components which are generally made available with the ODP core in a production Hadoop cluster.

 

 

Radiant Techlearning offers the IBM Open Platform with Apache Hadoop training program in Classroom & Virtual Instructor Led / Online mode.

 

 

Duration: 2 days

 

 

Learning Objectives

  • List, as well as explain the major components of the open-source Apache Hadoop stack along with the approach taken by the Open Data Foundation.
  • Administer and monitor the Hadoop clusters with the help of Apache Ambari and related components
  • Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.
  • Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).
  • Create and run basic MapReduce jobs using command line.
  • Explain how Spark integrates into the Hadoop ecosystem.
  • Execute iterative algorithms using Spark’s RDD.
  • Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper, Apache Slider, and Apache Knox.
  • Explore common methods for performing data movement:
  • Configure Flume for data loading of log files
  • Move data into the HDFS from relational databases using Sqoop
  • Understand when to use various data storage formats (flat files, CSV/delimited, Avro/Sequence files, Parquet, etc.).
  • Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R)
  • Query data from Hive.
  • Perform random access on data stored in HBase.
  • Explore advanced concepts, including Oozie and Solr

Pre-requisites

  • Knowledge and basic understanding of Linux can be helpful, but is not necessary.

 

Audience profile

This intermediate training program is prepared for those who want a foundation of IBM BigInsights, which would includes: 

  • Big data engineers
  • Data scientist
  • Developers or programmers
  • Administrators
  • Professionals who are curious about learning IBM’s Open Platform using Apache Hadoop.

Course content

Unit 1: IBM Open Platform with Apache Hadoop

  • Exercise 1: Exploring the HDFS

 

Unit 2: Apache Ambari

  • Exercise 2: Managing Hadoop clusters with Apache Ambari

 

Unit 3: Hadoop Distributed File System

  • Exercise 3: File access and basic commands with HDFS

 

Unit 4: MapReduce and Yarn

  • Topic 1: Introduction to MapReduce based on MR1
  • Topic 2: Limitations of MR1
  • Topic 3: YARN and MR2
  • Exercise 4: Creating and coding a simple MapReduce job
  • Possibly a more complex second Exercise

 

Unit 5: Apache Spark

  • Exercise 5: Working with Spark’s RDD to a Spark job

 

Unit 6: Coordination, management, and governance

  • Exercise 6: Apache ZooKeeper, Apache Slider, Apache Knox

 

Unit 7: Data Movement

  • Exercise 7: Moving data into Hadoop with Flume and Sqoop

 

Unit 8: Storing and Accessing Data

  • Topic 1: Representing Data: CSV, XML, JSON, and YAML
  • Topic 2: Open Source Programming Languages: Pig, Hive, and Other [R, Python, etc]
  • Topic 3: NoSQL Concepts
  • Topic 4: Accessing Hadoop data using Hive
  • Exercise 8: Performing CRUD operations using the HBase shell
  • Topic 5: Querying Hadoop data using Hive
  • Exercise 9: Using Hive to Access Hadoop / HBase Data

 

Unit 9: Advanced Topics

  • Topic 1: Controlling job workflows with Oozie
  • Topic 2: Search using Apache Solr

No lab exercises

FAQs

Q: What is Apache Hadoop?

 

A: This is a framework that allows distributed processing of large data sets using simple programming models. It is actually designed to scale up single servers to thousands of machines. 

 

Q: Explain about Hadoop YARN.

 

A: YARN allows the data stored in HDFS that is Hadoop Distributed File System to be processed and run by various data processing engines like batch processing, stream processing, interactive processing, graph processing etc

 

Q: What is spark RDD?

 

A: RDD stands for Resilient Distributed Datasets. It is a fundamental data structure of Spark. It is a distributed collection of objects. Datasets in RDD are divided into logical partitions, which can be computed on different nodes of the cluster.

 

Q: What is Apache Ambari?

 

A: This is a software project of the Apache Software Foundation. It allows the system administrators to  monitor, manage a Hadoop cluster, and it also allows to integrate Hadoop with the existing enterprise infrastructure. 

 

Q: What is HDFS?

 

A: HDFS stands for Hadoop Distributed File System. It is a distributed file system designed to run on commodity hardware. HDFS is fault-tolerant and is designed to be deployed on low-cost hardware. 

 

Q: Where is Radiant Techlearning Located?

 

A: Radiant Techlearning is headquartered in Electronic city & technology hub of Northern India, Noida, which is surrounded by several large multinational, medium & small Software companies. 

We have our offices located all across the country and partners across the globe.  

 

Q: What is the benefit of doing training from Radiant Techlearning?

 

A: Radiant Techlearning is receptive to new ideas and always believes in a creative approach that makes learning easy and effective. We stand strong with highly qualified & certified technology Consultants, trainers and developers who believe in amalgamation of practical and creative training to groom the technical skills.

Our training programs are practical oriented with 70% – 80% hands on the training technology tool.  Our training program focuses on one-on-one interaction with each participant, latest content in curriculum, real time projects and case studies during the training program. 

Our experts will also share best practices & will give you guidance to score high & perform better in your certification exams. 

To ensure your success, we provide support session even after the training program. 

You would also be awarded with a course completion certificate recognized by the industry after completion of the course & the assignment.

 

Q: What if I/we have doubts after attending your training program?

 

A: Radiant team of experts would be available on the email Support@radianttechlearning.com to answer your technical queries, even after the training program.

We also conduct a 3 – 4 hours online session after 2 weeks of the training program, to respond on your queries & project assigned to you.  

 

Q: If I face technical difficulty during the class what should I do?

 

A: Technical issues are unpredictable and might occur with you as well. Participants have to ensure that they have the system with required configuration with good internet speed to access online labs.  

If the problem still persists or you face any challenge during the class then you can report to us or your trainer. In that case Radiant would provide you the recorded session of that particular day. However, those recorded sessions are not meant only for personal consumption and NOT for distribution or any commercial use. 

 

Q: Does this training program include any project?

 

A: Yes, Radiant will provide you the most updated, high valued and relevant real time projects and case studies in each training program. 

We included projects in each training program from fundamental level to advance level so that you don’t have to face any difficulty in future. You will work on highly exciting projects and that will upgrade your skill, knowledge and industry experience.

 

Category:

Unble To Find a Batch..?

Request a Batch