Description

IBM BigInsights Foundation training program is for professionals who want a foundation of IBM BigInsights. This course comprises of two separate modules. The first module is IBM BigInsights Overview and it will give you a diagram of IBM’s enormous information methodology just as a why it is imperative to comprehend and utilize large information. It will cover IBM BigInsights as a stage for overseeing and picking up experiences from your enormous information. Thusly, you will perceive how the BigInsights have adjusted their contributions to all the more likely suit your requirements with the IBM Open Platform (IOP) alongside the three specific modules with esteem include that sits top of the IOP. Alongside that, you will get a prologue to the BigInsights esteem include including Big SQL, BigSheets, and Big R. The next module is IBM Open Platform with Apache Hadoop. IBM Open Platform (IOP) with Apache Hadoop is the main debut community stage to empower Big Data answers to be created on the normal arrangement of Apache Hadoop innovations. The Open Data Platform activity (ODP) is a mutual industry exertion concentrated on advancing and propelling the province of Apache Hadoop and Big Data innovations for the undertaking. The present environment is tested and eased back by divided and copied endeavours between various gatherings. The ODP Core will remove the mystery from the procedure and quicken many use cases by running on a typical stage. It permits ventures to concentrate on building business-driven applications. This module gives a top to bottom prologue to the principle segments of the ODP centre – specifically Apache Hadoop (comprehensive of HDFS, YARN, and MapReduce) and Apache Ambari – just as giving a treatment of the fundamental open-source segments that are commonly made accessible with the ODP centre in a creation Hadoop cluster. IBM BigInsights v4 itself is based upon the ODP centre and these other primary open-source components. The connections between the IBM Open Platform with Apache Hadoop and the BigInsights additional items is shrouded quickly in Unit 1 – expert.
Radiant Techlearning offers IBM BigInsights Foundation training program in Classroom & Virtual Instructor Led Online Mode.

 

Duration: 1 Day

 

Learning Objectives

On completion of this course, professionals should be able to:

  • Understand the purpose of big data and know why it is important
  • List the sources of data (data-at-rest vs data-in-motion)
  • Describe the IBM BigInsights offering
  • Utilize the various IBM BigInsights tools including Big SQL, BigSheets, Big R, Jaql and AQL for your big data needs.
  • List and describe the major components of the open-source Apache Hadoop stack and the approach taken by the Open Data Foundation.
  • Manage and monitor Hadoop clusters with Apache Ambari and related components
  • Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.
  • Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).
  • Create and run basic MapReduce jobs using command line.
  • Explain how Spark integrates int the Hadoop ecosystem.
  • Execute iterative algorithms using Spark’s RDD.
  • Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper, Apache Slider, and Apache Knox.
  • Explore common methods for performing data movement
  • Configure Flume for data loading of log files
  • Move data int the HDFS from relational databases using Sqoop
  • Understand when t use various data storage formats (flat files, CSV/delimited, Avro/Sequence files, Parquet, etc.).
  • Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R)
  • Query data from Hive.
  • Perform random access on data stored in HBase.
  • Explore advanced concepts, including Oozie and Solr

Prerequisites

In order to benefit from this course, professionals are recommended to have prior knowledge in

  • Linux

Audience Profile

  • Big data engineers
  • Data scientist
  • Developers or programmers
  • Administrators who are interested in learning about IBM’s Open Platform with Apache Hadoop

Course Details

Module 1 DW6A1

  • Understand the purpose of big data and know why it is important
  • List the sources of data (data-at-rest vs data-in-motion)
  • Describe the IBM BigInsights offering
  • Utilize the various IBM BigInsights tools including Big SQL, BigSheets, Big R, Jaql and AQL for your big data needs.
  • List and describe the major components of the open-source Apache Hadoop stack and the approach taken by the Open Data Foundation.
  • Manage and monitor Hadoop clusters with Apache Ambari and related components
  • Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.
  • Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).
  • Create and run basic MapReduce jobs using command line.
  • Explain how Spark integrates int the Hadoop ecosystem.
  • Execute iterative algorithms using Spark’s RDD.
  • Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper, Apache Slider, and Apache Knox.
  • Explore common methods for performing data movement
  • Configure Flume for data loading of log files
  • Move data int the HDFS from relational databases using Sqoop
  • Understand when t use various data storage formats (flat files, CSV/delimited, Avro/Sequence files, Parquet, etc.).
  • Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R)
  • Query data from Hive.
  • Perform random access on data stored in HBase.
  • Explore advanced concepts, including Oozie and Solr

Module 2 DW6A1

  • Unit 1: Introduction to Big Data
  • Exercise 1: Setting up the lab environment
  • Unit 2: Introduction to IBM BigInsights
  • Exercise 2: Getting started with IBM BigInsights
  • Unit 3: IBM BigInsights for Analysts
  • Exercise 3: Working with Big SQL and BigSheets
  • Unit 4: IBM BigInsights for Data Scientist
  • Exercise 4: Analyzing data with Big R, Jaql, and AQL
  • Unit 5: IBM BigInsights for Enterprise Management

Module 3 DW6B1

  • Unit 1: IBM Open Platform with Apache Hadoop
  • Exercise 1: Exploring the HDFS
  • Unit 2: Apache Ambari
  • Exercise 2: Managing Hadoop clusters with Apache Ambari
  • Unit 3: Hadoop Distributed File System
  • Exercise 3:  File access & basic commands with HDFS
  • Unit 4: MapReduce and Yarn
  • Topic 1:  Introduction to MapReduce based on MR1
  • Topic 2:  Limitations of MR1
  • Topic 3:  YARN and MR2
  • Exercise 4: Creating and coding a simple MapReduce job (Possibly a more complex second Exercise)
  • Unit 5: Apache Spark
  • Exercise 5: Working with Spark’s RDD to a Spark job
  • Unit 6: Coordination, management, and governance
  • Exercise 6: Apache ZooKeeper, Apache Slider, Apache Knox
  • Unit 7: Data Movement
  • Exercise 7: Moving data into Hadoop with Flume and Sqoop
  • Unit 8: Storing and Accessing Data
  • Topic 1:  Representing Data:  CSV, XML, JSON, and YAML
  • Topic 2:  Open Source Programming Languages: Pig, Hive, and Other [R, Python, etc]
  • Topic 3:  NoSQL Concepts
  • Topic 4:  Accessing Hadoop data using Hive
  • Exercise 8: Performing CRUD operations using the HBase shell
  • Topic 5:  Querying Hadoop data using Hive
  • Exercise 9:  Using Hive to Access Hadoop / HBase Data
  • Unit 9: Advanced Topics
  • Topic 1: Controlling job workflows with Oozie
  • Topic 2: Search using Apache Solr
  • No lab exercises

FAQs

Categories: ,

Unble To Find a Batch..?

Request a Batch