Building Batch Data Pipelines on Google Cloud

Course Overview

The Extra-Load, Extract-Load-Transform, or Extract-Transform-Load paradigms are the most common ones used in data pipelines. This course explains when to utilise each paradigm for batch data. Additionally, this course covers a number of Google Cloud data transformation technologies, such as BigQuery, running pipeline graphs in Cloud Data Fusion, Spark on Dataproc, and serverless data processing with Dataflow. Students will gain practical experience using Qwiklabs to construct data pipeline components on Google Cloud.

The emblem that is shown above can be yours if you've finished this course! Visit your profile page to see all the badges you have earned. Increase the visibility of your cloud career by showcasing your acquired knowledge.

Prerequisites

To benefit from this course, participants should have completed the course on "Google Cloud Big Data and Machine Learning Fundamentals" or have equivalent experience. The participant should also have the following: 

  •  Basic proficiency with a standard query language such as SQL. 
  •  Experience with data modeling and ETL (extract, transform, load) activities.  
  • Experience with developing applications using a standard programming language such as Python. 
  • Familiarity with machine learning and statistics

Audience Profile

Developers responsible for designing pipelines and architectures for data processing.

Learning Objectives

  • Review the various data loading techniques, including EL, ELT, and ETL, and when to utilise each.
  • Utilize Cloud Storage, run Hadoop on Dataproc, and optimise jobs in dataproc
  • Utilize Dataflow to create your data processing pipelines.
  • Utilize Data Fusion and Cloud Composer to control data flows.

Content Outline

In this module, we provide the overview of the course and agenda.

This module reviews various methods of data loading: EL, ETL, and ELT, and when to use what

This module shows how to use Hadoop on Dataproc, leverage Cloud Storage, and optimize your Dataproc jobs.

This module discusses using Dataflow to build your data processing pipelines.

This module explains on how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

FAQs

A: A fully managed service called Dataflow is used by the Google Cloud Platform ecosystem to run pipelines. It is a service that is entirely focused on altering and enhancing data in batch (historical) and stream (real time) modes.

A: Google Cloud has various powerful ETL tools that ensure you don't have to do ETL manually and compromise the integrity of your data. These include data preparation, pipeline building and management, and workflow orchestration tools.

A: Characteristics to look for when considering a data pipeline include:

  • Continuous and extensible data processing
  • The elasticity and agility of the Cloud
  • Isolated and independent resources for data processing
  • Democratized data access and self-service management
  • High availability and disaster recovery

A: Radiant has highly intensive selection criteria for Technology Trainers & Consultants who deliver training programs. Our trainers & consultants undergo a rigorous technical and behavioral interview and assessment process before they are onboarded in the company.

Our Technology experts/trainers & consultants carry deep-dive knowledge in the technical subject & are certified by the OEM.

Our training programs are practically oriented with 70% – 80% hands-on training technology tools. Our training program focuses on one-on-one interaction with each participant, the latest content in the curriculum, real-time projects, and case studies during the training program.

Our faculty will quickly provide you with the knowledge of each course from the fundamental level, and you are free to ask your doubts at any time from your respective faculty.

Our trainers have the patience and ability to explain complex concepts simplistically with depth and width of knowledge.

To ensure quality learning, we provide support sessions even after the training program.

A: To attend the training session, you should have operational Desktops or Laptops with the required specification and a good internet connection to access the labs. 

A: We would always recommend you attend the live session to practice & clarify the doubts instantly and get more value from your investment. However, if, due to some contingency, you have to skip the class, Radiant Techlearning will help you with the recorded session of that particular day. However, those recorded sessions are not meant only for personal consumption and NOT for distribution or any commercial use.

A: Radiant Techlearning has a data center containing a Virtual Training environment for participants' hand-on-practice. 

Participants can easily access these labs over Cloud with the help of a remote desktop connection. 

Radiant virtual labs allow you to learn from anywhere and in any time zone. 

A: The learners will be enthralled as we engage them in the natural world and Oriented industry projects during the training program. These projects will improve your skills and knowledge and give you a better experience. These real-time projects will help you a lot in your future tasks and assignments.

Send a Message.


  • Enroll