Data Engineering Integration for Developers

Training Overview

This training is applicable to software version 10.5. Learn to accelerate Data Engineering Integration through mass ingestion, incremental loads, transformations, processing of complex files, creating dynamic mappings, & integrating data science using Python. Optimize the Data Engineering system performance through monitoring, troubleshooting, & best practices while gaining an understanding of how to reuse application logic for Data Engineering use cases.

Prerequisites

Informatica Developer Tool for Big Data Developers (Instructor Led ) 

Audience profile

Developer

Learning Objectives

After successfully completing this training, professionals should be able to:

  • Mass ingest data to Hive & HDFS
  • Perform incremental loads in Mass Ingestion
  • Perform initial & incremental loads
  • Integrate with relational databases using SQOOP
  • Perform transformations across various engines
  • Execute a mapping using JDBC in Spark mode
  • Perform stateful computing & windowing
  • Process complex files
  • Parse hierarchical data on the Spark engine
  • Run profiles & choose sampling options on the Spark engine
  • Execute Dynamic Mappings
  • Create Audits on Mappings
  • Monitor logs using REST Operations Hub
  • Monitor logs using Log Aggregation & troubleshoot
  • Run mappings in the Databricks environment
  • Create mappings to access Delta Lake tables
  • Tune performances of Spark & Databricks jobs

Content Outline

  • Data Engineering concepts
  • Data Engineering Management features
  • Benefits of Data Engineering Management
  • Data Engineering Management architecture
  • Data Engineering Management developer tasks
  • Data Engineering Integration 10.4 new features
  • Integrating DEI with the Hadoop cluster
  • Hadoop file systems
  • Data Ingestion to HDFS & Hive using SQOOP
  • Mass Ingestion to HDFS & Hive – Initial load
  • Mass Ingestion to HDFS & Hive - Incremental load

Lab : 

  • Configure SQOOP for Processing Data Between Oracle (SQOOP) to HDFS
  • Configure SQOOP for processing data between an Oracle database & Hive
  • Creating Mapping Specifications using Mass Ingestion Service 
  • Data Engineering Integration engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture

Lab:

  • Executing a mapping in Spark mode
  • Connecting to a Deployed Application
  • Advanced Transformations in Data Engineering Integration Python & Update Strategy
  • Hive ACID Use Case
  • Stateful Computing & Windowing

Lab:

  • Creating a Reusable Python Transformation
  • Creating an Active Python Transformation
  • Performing Hive Upserts
  • Using Windowing Function LEAD
  • Using Windowing Function LAG
  • Creating a Macro Transformation
  • Data Engineering file formats – Avro, Parquet, JSON
  • Complex file data types – Structs, Arrays, Maps
  • Complex Configuration, Operators & Functions

Lab:

  • Converting Flat File data object to an Avro file
  • Using complex data types - Arrays, Structs, & Maps in a mapping
  • Hierarchical Data Processing
  • Flatten Hierarchical Data
  • Dynamic Flattening with Schema Changes
  • Hierarchical Data Processing with Schema Changes
  • Complex Configuration, Operators & Functions
  • Dynamic Ports
  • Dynamic Input Rules

Lab:

  • Flattening a complex port in a Mapping
  • Building dynamic mappings using dynamic ports
  • Building dynamic mappings using input rules
  • Performing Dynamic Flattening of complex ports
  • Parsing Hierarchical Data on the Spark Engine
  • Validation Environments
  • Execution Environment
  • Mapping Optimization
  • Mapping Recommendations & Insight
  • Scheduling, Queuing, & Node Labeling
  • Mapping Audits

Lab:

  • Implementing Recommendation
  • Implementing Insight
  • Implementing Mapping Audits
  • Hadoop Environment Logs
  • Spark Engine Monitoring
  • Blaze Engine Monitoring
  • REST Operations Hub
  • Log Aggregator
  • Troubleshooting

Lab:

  •  Monitoring Mappings using REST Operations Hub
  •  Viewing & analyzing logs using Log Aggregator
  • Intelligent Structure Discovery Overview
  • Intelligent Structure Model

Lab:

  • Use an Intelligent Structure Model in a Mapping
  • Databricks Overview
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, Jobs, & Data
  • Delta Lakes
  • Databricks Integration
  • Components of the Informatica & the Databricks environments
  • A run-time process on the Databricks Spark Engine
  • Databricks Integration Task Flow
  • Prerequisites for Databricks integration
  • Cluster Workflows

FAQs

 A: Informatica Data Engineering Integration provides optimized run-time processing & simplified monitoring across multiple engines for faster, more flexible, & repeatable development & processing.

 A: Data Engineering Integration can translate ANSI-compliant SQL scripts, Informatica PowerCenter® pre- or post-SQL, & SQL override queries into optimized Informatica big data mappings that execute on Hadoop, maximizing reuse, simplifying maintenance, & preserving end-to-end data lineage.

 A: Data Engineering Integration generates hundreds of run-time data flows based on just a handful of design patterns using mass ingestion & mapping templates. You can easily parameterize these data flows to handle dynamic schemas such as web & machine log files, which are common to big data projects. This means you can quickly build data flows that are easy to maintain & resilient to changing schemas.

A: Radiant has highly intensive selection criteria for Technology Trainers & Consultants who deliver training programs. Our trainers & consultants undergo rigorous technical & behavioral interviews & assessment processes before they are onboarded in the company.

Our Technology experts/trainers & consultants carry deep-dive knowledge in the technical subject & are certified by the OEM.

Our training programs are practically oriented with 70% – 80% hands-on training technology tools. Our training program focuses on one-on-one interaction with each participant, the latest content in the curriculum, real-time projects & case studies during the training program.

Our faculty will provide you the knowledge of each training from a fundamental level in an easy way & you are free to ask your doubts any time from your respective faculty.

Our trainers have the patience & ability to explain difficult concepts in a simplistic way with depth & width of knowledge.

To ensure quality learning, we provide support sessions even after the training program.

A: Radiant Techlearning offers a training program on weekdays, weekends & a combination of weekdays & weekends. You can always choose the schedule that best suits your need.

A: We would always recommend you attend the live session to practice & clarify the doubts instantly & get more value from your investment. However, due to some contingency, if you have to skip the class, Radiant Techlearning will help you with the recorded session of that particular day. However, those recorded sessions are not meant only for personal consumption & NOT for distribution or any commercial use.

A: Radiant Techlearning has a data center containing the Virtual Training environment for the purpose of participant hand-on-practice. 

Participants can easily access these labs over Cloud with the help of a remote desktop connection. 

Radiant virtual labs provide you the flexibility to learn from anywhere in the world & in any time zone. 

Send a Message.


  • Enroll