Radiant

Training Overview

This training is applicable to software version 10.5. Learn to accelerate Data Engineering Integration through mass ingestion, incremental loads, transformations, processing of complex files, creating dynamic mappings, & integrating data science using Python. Optimize the Data Engineering system performance through monitoring, troubleshooting, & best practices while gaining an understanding of how to reuse application logic for Data Engineering use cases.

Prerequisites

Informatica Developer Tool for Big Data Developers (Instructor Led )

Audience profile

Developer

Learning Objectives

After successfully completing this training, professionals should be able to:

Mass ingest data to Hive & HDFS
Perform incremental loads in Mass Ingestion
Perform initial & incremental loads
Integrate with relational databases using SQOOP
Perform transformations across various engines
Execute a mapping using JDBC in Spark mode
Perform stateful computing & windowing
Process complex files
Parse hierarchical data on the Spark engine
Run profiles & choose sampling options on the Spark engine
Execute Dynamic Mappings
Create Audits on Mappings
Monitor logs using REST Operations Hub
Monitor logs using Log Aggregation & troubleshoot
Run mappings in the Databricks environment
Create mappings to access Delta Lake tables
Tune performances of Spark & Databricks jobs

Content Outline

Module 1: Informatica Data Engineering Management Overview

Data Engineering concepts
Data Engineering Management features
Benefits of Data Engineering Management
Data Engineering Management architecture
Data Engineering Management developer tasks
Data Engineering Integration 10.4 new features

Module 2: Ingestion & Extraction in Hadoop

Integrating DEI with the Hadoop cluster
Hadoop file systems
Data Ingestion to HDFS & Hive using SQOOP
Mass Ingestion to HDFS & Hive – Initial load
Mass Ingestion to HDFS & Hive - Incremental load

Lab :

Configure SQOOP for Processing Data Between Oracle (SQOOP) to HDFS
Configure SQOOP for processing data between an Oracle database & Hive
Creating Mapping Specifications using Mass Ingestion Service

Module 3: Native & Hadoop Engine Strategy

Data Engineering Integration engine strategy
Hive Engine architecture
MapReduce
Tez
Spark architecture
Blaze architecture

Lab:

Executing a mapping in Spark mode
Connecting to a Deployed Application

Module 4: Data Engineering Development Process

Advanced Transformations in Data Engineering Integration Python & Update Strategy
Hive ACID Use Case
Stateful Computing & Windowing

Lab:

Creating a Reusable Python Transformation
Creating an Active Python Transformation
Performing Hive Upserts
Using Windowing Function LEAD
Using Windowing Function LAG
Creating a Macro Transformation

Module 5: Complex File Processing

Data Engineering file formats – Avro, Parquet, JSON
Complex file data types – Structs, Arrays, Maps
Complex Configuration, Operators & Functions

Lab:

Converting Flat File data object to an Avro file
Using complex data types - Arrays, Structs, & Maps in a mapping

Module 6: Hierarchical Data Processing

Hierarchical Data Processing
Flatten Hierarchical Data
Dynamic Flattening with Schema Changes
Hierarchical Data Processing with Schema Changes
Complex Configuration, Operators & Functions
Dynamic Ports
Dynamic Input Rules

Lab:

Flattening a complex port in a Mapping
Building dynamic mappings using dynamic ports
Building dynamic mappings using input rules
Performing Dynamic Flattening of complex ports
Parsing Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization & Performance Tuning

Validation Environments
Execution Environment
Mapping Optimization
Mapping Recommendations & Insight
Scheduling, Queuing, & Node Labeling
Mapping Audits

Lab:

Implementing Recommendation
Implementing Insight
Implementing Mapping Audits

Module 8: Monitoring Logs & Troubleshooting in Hadoop

Hadoop Environment Logs
Spark Engine Monitoring
Blaze Engine Monitoring
REST Operations Hub
Log Aggregator
Troubleshooting

Lab:

Monitoring Mappings using REST Operations Hub
Viewing & analyzing logs using Log Aggregator

Module 9: Intelligent Structure Model

Intelligent Structure Discovery Overview
Intelligent Structure Model

Lab:

Use an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

Databricks Overview
Steps to configure Databricks
Databricks clusters
Notebooks, Jobs, & Data
Delta Lakes

Module 11: Databricks Integration

Databricks Integration
Components of the Informatica & the Databricks environments
A run-time process on the Databricks Spark Engine
Databricks Integration Task Flow
Prerequisites for Databricks integration
Cluster Workflows

FAQs

Q: What is Informatica Data Engineering Integration?

A: Informatica Data Engineering Integration provides optimized run-time processing & simplified monitoring across multiple engines for faster, more flexible, & repeatable development & processing.

Q: What is SQL to Mapping Conversion Informatica?

A: Data Engineering Integration can translate ANSI-compliant SQL scripts, Informatica PowerCenter® pre- or post-SQL, & SQL override queries into optimized Informatica big data mappings that execute on Hadoop, maximizing reuse, simplifying maintenance, & preserving end-to-end data lineage.

Q: Faster Mass Ingestion & Extraction Informatica?

A: Data Engineering Integration generates hundreds of run-time data flows based on just a handful of design patterns using mass ingestion & mapping templates. You can easily parameterize these data flows to handle dynamic schemas such as web & machine log files, which are common to big data projects. This means you can quickly build data flows that are easy to maintain & resilient to changing schemas.

Q: How do you ensure the quality of the training program?

A: Radiant has highly intensive selection criteria for Technology Trainers & Consultants who deliver training programs. Our trainers & consultants undergo rigorous technical & behavioral interviews & assessment processes before they are onboarded in the company.

Our Technology experts/trainers & consultants carry deep-dive knowledge in the technical subject & are certified by the OEM.

Our training programs are practically oriented with 70% – 80% hands-on training technology tools. Our training program focuses on one-on-one interaction with each participant, the latest content in the curriculum, real-time projects & case studies during the training program.

Our faculty will provide you the knowledge of each training from a fundamental level in an easy way & you are free to ask your doubts any time from your respective faculty.

Our trainers have the patience & ability to explain difficult concepts in a simplistic way with depth & width of knowledge.

To ensure quality learning, we provide support sessions even after the training program.

Q: What is the schedule of the training program?

A: Radiant Techlearning offers a training program on weekdays, weekends & a combination of weekdays & weekends. You can always choose the schedule that best suits your need.

Q: What if I miss a class on a particular day?

A: We would always recommend you attend the live session to practice & clarify the doubts instantly & get more value from your investment. However, due to some contingency, if you have to skip the class, Radiant Techlearning will help you with the recorded session of that particular day. However, those recorded sessions are not meant only for personal consumption & NOT for distribution or any commercial use.

Q: How will I be accessing the labs?

A: Radiant Techlearning has a data center containing the Virtual Training environment for the purpose of participant hand-on-practice.

Participants can easily access these labs over Cloud with the help of a remote desktop connection.

Radiant virtual labs provide you the flexibility to learn from anywhere in the world & in any time zone.

Send a Message.

Enroll

Data Engineering Integration for Developers

Training Overview

Prerequisites

Audience profile

Learning Objectives

Content Outline

FAQs

Send a Message.

Training Category