Cloudera Data Engineering-Developing Applications with Apache Spark

Course Description

This four-day hands-on training course teaches developers the fundamental concepts and knowledge required to use Apache Spark to build high-performance, parallel applications on the Cloudera Data Platform (CDP).

 

Students can practice writing Spark applications that integrate with CDP core components such as Hive & Kafka through hands-on exercises. Participants will learn how to query structured data with Spark SQL, how to perform real-time processing on streaming data with Spark Streaming, and how to work with "big data" stored in a distributed file system.

Prerequisites

All students are expected to have essential Linux experience & basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful. Prior knowledge of Spark & Hadoop is optional.

Audience Profile

This training is designed for developers & data engineers.

 

Learning Objectives

Through instructor-led discussion & interactive, hands-on exercises, you will learn how to:

  •  Distribute, store, & process data in a CDP cluster
  •  Write, configure, & deploy Apache Spark applications
  •  Use Spark interpreters & Spark applications to explore, process, & analyze distributed data
  •  Query data using Spark SQL, DataFrames, & Hive tables
  •  Use Spark Streaming together with Kafka to process a data stream

Content Outline

Lessons 

  •  Why Notebooks?
  •  Zeppelin Notes
  •  Demo: Apache Spark In 5 Minutes

Lessons  

  •  HDFS Overview
  •  HDFS Components & Interactions
  •  Additional HDFS Interactions
  •  Ozone Overview
  •  Exercise: Working with HDFS 

Lessons 

  •  YARN Overview
  •  YARN Components & Interaction
  •  Working with YARN
  •  Exercise: Working with YARN

 Lessons

  • The Disk Years: 2000 ->2010
  • The Memory Years: 2010 ->2020
  • The GPU Years: 2020 ->

Lessons

  • Introduction to DataFrames
  • Exercise: Introducing DataFrames
  • Exercise: Reading & Writing DataFrames
  • Exercise: Working with Columns
  • Exercise: Working with Complex Types
  • Exercise: Working with Complex Types
  • Exercise: Combining & Splitting DataFrames
  • Exercise: Summarizing & Grouping DataFrames
  • Exercise: Working with UDFs
  • Exercise: Working with Windows

Lessons

  • Apache Hive

Lessons

  • Hive & Spark Integration
  • Exercise: Spark Integration with Hive

Lessons

  • Introduction to Data Visualization with Zeppelin
  • Zeppelin Analytics
  • Zeppelin Collaboration
  • Exercise: AdventureWorks

Lessons

  • Shuffle
  • Skew
  • Order

Lessons

  • Spark Distributed Processing
  • Exercise: Explore Query Execution Order

Lessons

  • DataFrame & Dataset Persistence
  • Persistence Storage Levels
  • Viewing Persisted RDDs
  • Exercise: Persisting DataFrames

Lessons

  • Writing a Spark Application
  • Building & Running an Application
  • Application Deployment Mode
  • The Spark Application Web UI
  • Configuring Application Properties
  • Exercise: Writing, Configuring, & Running a Spark Application

Lessons

  • Introduction to Structured Streaming
  • Exercise: Processing Streaming Data

Lessons

  • What is Apache Kafka?
  • Apache Kafka Overview
  • Scaling Apache Kafka
  • Apache Kafka Cluster Architecture
  • Apache Kafka Command Line Tools

Lessons

  • Receiving Kafka Messages
  • Sending Kafka Messages
  • Exercise: Working with Kafka Streaming Messages

Lessons

  • Streaming Aggregation
  • Joining Streaming DataFrames
  • Exercise: Aggregating & Joining Streaming DataFrames

Lessons

  • Working with Datasets in Scala
  • Exercise: Using Datasets in Scala

Certification

CCA Spark & Hadoop Developer Exam (CCA175)

 

FAQs

A: To attend the training session, you should have operational Desktops or Laptops with the required specifications and a good internet connection to access the labs.

A: We would always recommend you attend the live session to practice & clarify the doubts instantly & get more value from your investment. However, if, due to some contingency if you have to skip the class, Radiant Tech learning will help you with the recorded session of that particular day. However, those recorded sessions are not meant only for personal consumption & NOT for distribution or commercial use.

 

A: Radiant Tech learning has a data center containing a Virtual Training environment for participants' hand-on-practice. Participants can easily access these labs over Cloud with the help of a remote desktop connection. Radiant virtual labs allow you to learn from anywhere in the world & in any time zone.

 

A: The learners will be enthralled as we engage them the real-world & Oriented industry projects during the training program. These projects will improve your skills & knowledge, & you will gain a better experience. These real-time projects will help you a lot in your future tasks & assignments.

 

A: You can request a refund if you do not wish to enroll in the course.

 

A: Yes, you can.

 

A: We adhere to the highest Internet security standards. Any data that is kept is never shared with third parties.

 

A: It is recommended but optional. Being acquainted with the primary course material will enable students & the trainer to move at the desired pace during classes. You can access courseware for most vendors.

 

A: You can buy online from the page by clicking on "Buy Now." You can view alternate payment methods on the payment options page.

 

A: Yes, students can pay from the course page.

A: The course completion certification will be awarded to all the professionals who have completed the training program & the project assignment given by your instructor. Using the certificate in your future job interviews will surely help you land your dream job.

A: Radiant believes in a practical & creative approach to training & development, which distinguishes it from other training & developmental platforms. Moreover, training courses are undertaken by experts with a range of experience in their domain.

A: Radiant team of experts will be available at e-mail support@radianttechlearning.com to answer your technical queries after the training program.

 

A: Yes, Radiant will provide you most updated, high, value-relevant real-time projects & case studies in each training program.

 

A: Technical issues are unpredictable & might occur with us as well. Participants must ensure access to the required configuration with good internet speed.

 

A: Radiant Techlearning offers training programs on weekdays, weekends & combination of weekdays & weekends. We provide you with complete liberty to choose the schedule that suits your need.

 

A: Radiant has highly intensive selection criteria for Technology Trainers & Consultants who deliver training programs. Our trainers & consultants undergo rigorous technical & behavioral interviews & assessment processes before they are boarded in the company.

Our Technology experts/trainers & consultants carry deep-dive knowledge in the technical subject & are certified by the OEM.

Our training programs are practically oriented with 70% – 80% hands-on training technology tools. Our training program focuses on one-on-one interaction with each participant, the latest content in the curriculum, real-time projects & case studies during the training program.

Our faculty will provide you with the knowledge of each course from the fundamental level in an easy way & you are free to ask your doubts any time from your respective faculty.

Our trainers have patience & ability to explain complex concepts simplistically with depth & width of knowledge.

To ensure quality learning, we provide a support session even after the training program.

 

Send a Message.


  • Enroll