Data
warehousing Basics
- Definition of data warehousing?
ü Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile
collection of data in support of management's decision making process.
Subject Oriented
Data warehouses are designed to help you analyze data. For
example, to learn more about your company's sales data, you can build a
warehouse that concentrates on sales. Using this warehouse, you can answer
questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case
makes the data warehouse subject oriented.
Integrated
Integration is closely
related to subject orientation. Data
warehouses must put data from disparate sources into a consistent format.
They must resolve such problems as naming conflicts and inconsistencies among
units of measure. When they achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data
should not change. This is logical because the purpose of a warehouse is to enable
you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large
amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on
change over time is what is meant by the term time variant.
- How many stages in Datawarehousing?
Data warehouse generally
includes two stages
ü ETL
ü Report Generation
ETL
- Extract -- the process of reading data from a source database.
- Transform -- the process of converting the extracted data from its previous form into required form
- Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format to another format.
It is used to retrieve the
data from various operational databases and is transformed into useful
information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE
WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical
processing. It is a set of specification which allows the client applications
in retrieving the data for analytical processing.
It is a specialized tool
that sits between a database and user in order to provide various analyses of
the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for top level management.
- Business Objects
- Cognos
- Micro strategy
- Hyperion
- Oracle Express
6. Microsoft Analysis Services
- Different Between OLTP and OLAP
|
OLTP
|
OLAP
|
1
|
Application
Oriented (e.g., purchase order it is functionality of an application)
|
Subject Oriented (subject in the sense customer,
product, item, time)
|
2
|
Used to run business
|
Used to analyze business
|
3
|
Detailed data
|
Summarized data
|
4
|
Repetitive access
|
Ad-hoc access
|
5
|
Few Records accessed at a
time (tens), simple query
|
Large volumes accessed at
a time(millions), complex query
|
6
|
Small database
|
Large Database
|
7
|
Current data
|
Historical data
|
8
|
Clerical User
|
Knowledge User
|
9
|
Row by Row Loading
|
Bulk Loading
|
10
|
Time invariant
|
Time variant
|
11
|
Normalized data
|
De-normalized data
|
12
|
E – R schema
|
Star schema
|
- What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
ü It provides a central database for decision support throughout the enterprise
ü It is a collection of DATAMARTS
DATAMART
ü It is a subset of Datawarehousing
ü It is a subject oriented database which supports the needs of
individuals depts. in an organizations
ü It is called high performance
query structure
ü It supports particular line of business like sales, marketing etc..
ODS
(Operational data store)
ü It is defined as an integrated view of operational database designed to
support operational monitoring
ü It is a collection of operational data sources designed to support
Transaction processing
ü Data is refreshed near real-time and used for business activity
ü It is an intermediate between the OLTP and OLAP which helps to create
an instance reports
4.
What are
the modeling involved in Data Warehouse Architecture?
5. What are the types of Approach in DWH?
Bottom up approach: first we need to develop data mart then we
integrate these data mart into EDW
Top down approach: first we need to develop EDW then form that EDW we
develop data mart
Bottom up
OLTP
ETL Data mart DWH
OLAP
Top down
OLTP ETL
DWH Data mart OLAP
Top down
ü Cost of initial planning & design is high
ü Takes longer duration of more than an year
Bottom up
ü Planning & Designing the Data Marts without waiting for the Global
warehouse design
ü Immediate results from the data marts
ü Tends to take less time to implement
ü Errors in critical modules are detected earlier.
ü Benefits are realized in the early phases.
ü It is a Best Approach
Data Modeling Types:
ü Conceptual Data Modeling
ü Logical Data Modeling
ü Physical Data Modeling
ü Dimensional Data Modeling
1. Conceptual Data Modeling
ü Conceptual data model includes all major entities and relationships and
does not contain much detailed level of information about attributes and is
often used in the INITIAL PLANNING PHASE
ü Conceptual data model is created by gathering business requirements from
various sources like business documents, discussion with functional teams,
business analysts, smart management experts and end users who do the reporting
on the database. Data modelers create conceptual data model and forward that
model to functional team for their review.
ü Conceptual data modeling
gives an idea to the functional and technical team about how business
requirements would be projected in the logical data model.
ü This is the actual
implementation and extension of a conceptual data model. Logical data model includes all required entities, attributes, key
groups, and relationships that represent
business information and define business rules.
ü Physical data model includes all required tables, columns, relationships, database
properties for the physical implementation of databases. Database
performance, indexing strategy, physical storage and demoralization are
important parameters of a physical model.
Logical Data Model
|
Physical Data Model
|
Represents business
information and defines business rules
|
Represents the physical
implementation of the model in a database.
|
Entity
|
Table
|
Attribute
|
Column
|
Primary Key
|
Primary Key Constraint
|
Alternate Key
|
Unique Constraint or Unique
Index
|
Inversion Key Entry
|
Non Unique Index
|
Rule
|
Check Constraint, Default
Value
|
Relationship
|
Foreign Key
|
Definition
|
Comment
|
ü Dimension model consists of fact and dimension tables
ü It is an approach to develop the schema DB designs
ü Star schema
ü Snow flake
schema
ü Star flake
schema (or) Hybrid schema
ü Multi star
schema
ü The Star Schema Logical database design which contains a centrally
located fact table surrounded by at least one or more dimension tables
ü Since the database design looks like a star, hence it is called star
schema db
ü The Dimension table contains Primary keys and the textual descriptions
ü It contain de-normalized business information
ü A Fact table contains a composite key and measures
ü The measure are of types of key
performance indicators which are used to evaluate the enterprise
performance in the form of success and failure
ü
Eg: Total revenue , Product
sale , Discount given, no of customers
ü To generate meaningful report the report should contain at least one
dimension and one fact table
The advantage
of star schema
ü Less number of
joins
ü Improve query
performance
ü Slicing down
ü Easy
understanding of data.
Disadvantage:
ü Require more storage space
Example of Star Schema:
ü In star schema, If the dimension tables are spitted into one or more
dimension tables
ü The de-normalized dimension tables are spitted into a normalized
dimension table
ü
In Snowflake schema, the example diagram shown
below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is
that hierarchies (category, branch, state, and month) are being broken out of
the dimension tables (PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively
and separately.
ü
It increases the number of joins and poor
performance in retrieval of data.
ü
In few organizations, they try to normalize the
dimension tables to save space.
ü
Since dimension tables hold less space snow flake
schema approach may be avoided.
ü
Bit map indexes cannot be effectively utilized
Important aspects of Star Schema & Snow Flake
Schema
ü
In a star schema every dimension will have a
primary key.
ü
In a star schema, a dimension table will not have
any parent table.
ü
Whereas in a snow flake schema, a dimension table
will have one or more parent tables.
ü
Hierarchies for the dimensions are stored in the
dimensional table itself in star schema.
ü
Whereas hierarchies are broken into separate tables
in snow flake schema. These hierarchies help to drill down the data from
topmost hierarchies to the lowermost hierarchies.
Star
flake schema (or) Hybrid Schema
ü
Hybrid schema is a combination of Star and Snowflake
schema
Multi
Star schema
ü
Multiple fact tables sharing a set of dimension
tables
ü
Confirmed Dimensions are nothing but Reusable Dimensions.
ü
The dimensions which u r using multiple times or in
multiple data marts.
ü
Those are common in different data marts
Measure
Types (or) Types of Facts
- Additive - Measures that can be summed up across all dimensions.
- Ex: Sales Revenue
- Semi Additive - Measures that can be summed up across few dimensions and not with others
- Ex: Current Balance
- Non Additive - Measures that cannot be summed up across any of the dimensions.
- Ex: Student attendance
Surrogate
Key
ü
Joins between fact and dimension tables should be
based on surrogate keys
ü
Users should not obtain any information by looking
at these keys
ü
These keys should be simple integers
A sample data warehouse schema
Why need staging area for DWH?
ü
Staging area needs to clean operational data before
loading into data warehouse.
ü
Cleaning in the sense your merging data which comes
from different source.
ü
It’s the area where most of the ETL is done
Data Cleansing
ü
It is used
to remove duplications
ü
It is used to correct wrong email addresses
ü
It is used to identify missing data
ü
It used to convert the data types
ü
It is used to capitalize name & addresses.
Types of
Dimensions:
There are three
types of Dimensions
ü Confirmed Dimensions
ü Junk Dimensions Garbage Dimension
ü Degenerative Dimensions
ü
Slowly changing Dimensions
Garbage Dimension or Junk Dimension
ü Confirmed is something which can be shared by multiple Fact Tables or
multiple Data Marts.
ü Junk Dimensions is grouping flagged values
ü Degenerative Dimension is something dimensional in nature but exist fact
table.(Invoice No)
Which is neither fact nor strictly dimension attributes. These are useful for some kind of analysis. These are kept as attributes in fact table called degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value; it has no associated dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to improve performance. SO here the column, bill_no is a degenerate dimension or line item dimension.
Informatica Architecture
The
Power Center domain
It is a primary unit of the
Administration.
Can have single and multiple domains.
It is a collection of nodes and
services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain
acts as a gateway node to receive service requests from clients and route them
to the appropriate service and node
Integration Service:
Integration Service does all the real
job. It extracts data from sources, processes it as per the business logic and loads data to
targets.
Repository Service:
Repository Service is used to fetch
the data from the repository and sends it back to the requesting components (mostly client tools
and integration service)
Power Center Repository:
Repository is nothing but a relational
database which stores all the metadata created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple
tools.
Power Center Administration Console:
This is simply a web-based
administration tool you can use to administer the Power Center installation.
Q. How can
you define a transformation? What are different types of transformations available
in Informatica?
A. A transformation
is a repository object that generates, modifies, or passes data. The Designer provides
a set of transformations that perform specific functions. For example, an
Aggregator transformation performs calculations on groups of data. Below are
the various transformations available in Informatica:
• Aggregator
• Custom
• Expression
• External
Procedure
• Filter
• Input
• Joiner
• Lookup
• Normalizer
• Rank
• Router
• Sequence
Generator
• Sorter
• Source
Qualifier
• Stored
Procedure
• Transaction
Control
• Union
• Update
Strategy
• XML
Generator
• XML
Parser
• XML
Source Qualifier
Q. What is a
source qualifier? What is meant by Query Override?
A. Source Qualifier
represents the rows that the PowerCenter Server reads from a relational or flat
file source when it runs a session. When a relational or a flat file source
definition is added to a mapping, it is connected to a Source Qualifier
transformation.
PowerCenter Server
generates a query for each Source Qualifier Transformation whenever it runs the
session. The default query is SELET statement containing all the source
columns. Source Qualifier has capability to override this default query by
changing the default settings of the transformation properties. The list of
selected ports or the order they appear in the default query should not be
changed in overridden query.
Q. What is
aggregator transformation?
A. The Aggregator
transformation allows performing aggregate calculations, such as averages and
sums. Unlike Expression Transformation, the Aggregator transformation can only
be used to perform calculations on groups. The Expression transformation
permits calculations on a rowby-row basis only.
Aggregator
Transformation contains group by ports that indicate how to group the data.
While grouping the data, the aggregator transformation outputs the last row of
each group unless otherwise specified in the transformation properties.
Various group by
functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN,
MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is
Incremental Aggregation?
A. Whenever a
session is created for a mapping Aggregate Transformation, the session option for
Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the
mapping and uses historical cache data to perform new aggregation calculations
incrementally.
Q. How Union
Transformation is used?
A. The union
transformation is a multiple input group transformation that can be used to
merge data from various sources (or pipelines). This transformation works just
like UNION ALL statement in SQL, that is used to combine result set of two
SELECT statements.
Q. Can two
flat files be joined with Joiner Transformation?
A. Yes, joiner
transformation can be used to join data from two flat file sources.
Q. What is a
look up transformation?
A. This
transformation is used to lookup data in a flat file or a relational table,
view or synonym. It compares lookup transformation ports (input ports) to the
source column values based on the lookup condition. Later returned values can
be passed to other transformations.
Q. Can a
lookup be done on Flat Files?
A. Yes.
Q. What is a
mapplet?
A. A mapplet is a
reusable object that is created using mapplet designer. The mapplet contains set
of transformations and it allows us to reuse that transformation logic in
multiple mappings.
Q. What does
reusable transformation mean?
A. Reusable
transformations can be used multiple times in a mapping. The reusable
transformation is
stored as a metadata separate from any other mapping that uses the
transformation.
Whenever any changes to a reusable transformation are made, all the mappings where
the transformation is used will be invalidated.
Q. What is
update strategy and what are the options for update strategy?
A. Informatica
processes the source data row-by-row. By default every row is marked to be inserted
in the target table. If the row has to be updated/inserted based on some logic
Update Strategy transformation is used. The condition can be specified in
Update Strategy to mark the processed row for update or insert.
Following options
are available for update strategy:
• DD_INSERT: If this is used
the Update Strategy flags the row for insertion. Equivalent numeric value of
DD_INSERT is 0.
• DD_UPDATE: If this is used
the Update Strategy flags the row for update. Equivalent numeric value of
DD_UPDATE is 1.
• DD_DELETE: If this is used
the Update Strategy flags the row for deletion. Equivalent numeric value of
DD_DELETE is 2.
• DD_REJECT: If this is used
the Update Strategy flags the row for rejection. Equivalent numeric value of
DD_REJECT is 3.
Q. What are the types of loading in Informatica?
There
are two types of loading, 1. Normal
loading and 2. Bulk
loading.
In
normal loading, it loads record by record and writes log for that. It takes
comparatively a longer time to load data to the target.
In
bulk loading, it loads number of records at a time to target database. It takes
less time to load data to target.
Q. What is aggregate cache in aggregator transformation?
The aggregator stores data in the aggregate cache until it
completes aggregate calculations. When you run a session that uses an
aggregator transformation, the informatica server creates index and data caches
in memory to process the transformation. If the informatica server requires
more space, it stores overflow values in cache files.
Q. What type
of repositories can be created using Informatica Repository Manager?
A. Informatica
PowerCenter includes following type of repositories:
• Standalone Repository:
A repository that functions individually and this is unrelated to any other repositories.
• Global Repository:
This is a centralized repository in a domain. This repository can
contain shared
objects across the repositories in a domain. The objects are shared through global
shortcuts.
• Local Repository:
Local repository is within a domain and it��s
not a global repository. Local repository can connect to a global repository
using global shortcuts and can use objects in its shared folders.
• Versioned Repository:
This can either be local or global repository but it allows version control for
the repository. A versioned repository can store multiple copies, or versions
of an object. This feature allows efficiently developing, testing and deploying
metadata in the production environment.
Q. What is a
code page?
A. A code page
contains encoding to specify characters in a set of one or more languages. The code
page is selected based on source of the data. For example if source contains
Japanese text then the code page should be selected to support Japanese text.
When a code page is
chosen, the program or application for which the code page is set, refers to a
specific set of data that describes the characters the application recognizes.
This influences the way that application stores, receives, and sends character
data.
Q. Which all
databases PowerCenter Server on Windows can connect to?
A. PowerCenter
Server on Windows can connect to following databases:
• IBM
DB2
• Informix
• Microsoft
Access
• Microsoft
Excel
• Microsoft
SQL Server
• Oracle
• Sybase
• Teradata
Q. Which all
databases PowerCenter Server on UNIX can connect to?
A. PowerCenter
Server on UNIX can connect to following databases:
• IBM
DB2
• Informix
• Oracle
• Sybase
• Teradata
Q. How to
execute PL/SQL script from Informatica mapping?
A. Stored Procedure
(SP) transformation can be used to execute PL/SQL Scripts. In SP
Transformation PL/SQL procedure name can be specified. Whenever
the session is executed, the session will call the pl/sql procedure.
Q. What is Data Driven?
The informatica server follows instructions coded into update strategy
transformations within the session mapping which determine how to flag records
for insert, update, delete or reject. If we do not choose data driven option
setting, the informatica server ignores all update strategy transformations in
the mapping.
Q. What
are the types of mapping wizards that are provided in Informatica?
The designer provide two mapping wizard.
1. Getting Started Wizard - Creates mapping
to load static facts and dimension tables as well as slowly growing dimension
tables.
2. Slowly Changing Dimensions Wizard -
Creates mappings to load slowly changing dimension tables based on the amount
of historical dimension data we want to keep and the method we choose to handle
historical dimension data.
Q.
What is Load Manager?
A. While running a Workflow, the
PowerCenter Server uses the Load
Manager
process and the Data Transformation Manager Process (DTM) to run the workflow and
carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager performs the following
tasks:
1. Locks the
workflow and reads workflow properties.
2. Reads the
parameter file and expands workflow variables.
3. Creates the
workflow log file.
4. Runs
workflow tasks.
5. Distributes
sessions to worker servers.
6. Starts the
DTM to run sessions.
7. Runs
sessions from master servers.
8. Sends
post-session email if the DTM terminates abnormally.
When the
PowerCenter Server runs a session, the DTM
performs the following tasks:
1. Fetches
session and mapping metadata from the repository.
2. Creates and
expands session variables.
3. Creates the
session log file.
4. Validates
session code pages if data code page validation is enabled. Checks
Query conversions
if data code page validation is disabled.
5. Verifies
connection object permissions.
6. Runs
pre-session shell commands.
7. Runs
pre-session stored procedures and SQL.
8. Creates and
runs mappings, reader, writer, and transformation threads to extract,
transform, and
load data.
9. Runs
post-session stored procedures and SQL.
10. Runs
post-session shell commands.
11. Sends
post-session email.
Q.
What is Data Transformation Manager?
A. After the
load manager performs validations
for the session, it creates the DTM
process. The
DTM process is the second process associated with the session run. The
primary
purpose of the DTM process is to create and manage threads that carry out
the session
tasks.
• The DTM allocates process memory for the session and divide it
into buffers. This
is also known
as buffer memory. It creates the main thread, which is called the
master thread.
The master thread creates and manages all other threads.
• If we partition a session, the DTM creates a set of threads for
each partition to
allow
concurrent processing.. When Informatica server writes messages to the
session log it
includes thread type and thread ID.
Following are
the types of threads that DTM creates:
Master Thread - Main thread of the DTM process.
Creates and manages all other
threads.
Mapping Thread - One Thread to Each Session.
Fetches Session and Mapping
Information.
Pre and Post Session Thread - One Thread each to Perform Pre and Post Session
Operations.
Reader Thread - One Thread for Each Partition for
Each Source Pipeline.
Writer Thread
- One Thread for Each Partition if target exist in the source pipeline
write to the
target.
Transformation Thread - One or More
Transformation Thread For Each Partition.
Q.
What is Session and Batches?
Session - A Session Is A set of instructions
that tells the Informatica Server How
And When To
Move Data From Sources To Targets. After creating the session, we
can use either
the server manager or the command line program pmcmd to start
or stop the
session.
Batches - It Provides A Way to Group
Sessions For Either Serial Or Parallel Execution By The Informatica Server.
There Are Two Types Of Batches:
1. Sequential - Run Session One after the
Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways
you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a
transformation?
A. It is a repository
object that generates, modifies or passes data.
Q. What are the
designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many
ways can you create ports?
A. Two ways
1. Drag the port from
another transformation
2. Click the add button on the ports tab.
Q. What are
reusable transformations?
A. A transformation that
can be reused is called a reusable transformation
They can be created using
two methods:
1. Using transformation
developer
2. Create normal one and promote it to reusable
Q. Is aggregate
cache in aggregator transformation?
A. The aggregator stores
data in the aggregate cache until it completes aggregate calculations. When u
run a session that uses an aggregator transformation, the Informatica server
creates index and data caches in memory to process the transformation. If the
Informatica server requires more space, it stores overflow values in cache
files.
Q.
What r the settings that u use to configure the joiner transformation?
·
Master and detail source
·
Type of join
·
Condition of the join
Q.
What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows
from both master and detail
Master outer -- all detail rows and only matching
rows from master
Detail outer -- all master rows and only matching
rows from detail
Full outer -- all rows from both master and
detail (matching or non matching)
Q.
What are the joiner caches?
A. When a
Joiner transformation occurs in a session, the Informatica Server reads all the
records from the master source and builds index and data caches based on the
master rows. After building the caches, the Joiner transformation reads records
from the detail source and performs
joins.
Q.
What r the types of lookup caches?
·
Static cache: You can configure a static or
read-only cache for only lookup table. By default Informatica server creates a
static cache. It caches the lookup table and lookup values in the cache for each
row that comes into the transformation. When the lookup condition is true, the
Informatica server does not update the cache while it processes the lookup
transformation.
·
Dynamic cache: If you want to cache the target
table and insert new rows into cache and the target, you can create a look up
transformation to use dynamic cache. The Informatica server dynamically inserts
data to the target table.
·
Persistent cache: You can save the lookup cache files
and reuse them the next time the Informatica server processes a lookup transformation
configured to use the cache.
·
Recache from database: If the persistent cache is not
synchronized with the lookup table, you can configure the lookup transformation
to rebuild the lookup cache.
·
Shared cache: You can share the lookup cache
between multiple transactions. You can share unnamed cache between
transformations in the same mapping.
Q. What is
Transformation?
A: Transformation is a
repository object that generates, modifies, or passes data.
Transformation performs specific
function. They are two types of transformations:
1. Active
Rows, which are affected
during the transformation or can change the no of rows that pass through it. Eg: Aggregator, Filter, Joiner,
Normalizer, Rank, Router, Source qualifier, Update Strategy, ERP Source
Qualifier, Advance External Procedure.
2. Passive
Does not change the number
of rows that pass through it. Eg:
Expression, External Procedure, Input, Lookup, Stored Procedure, Output,
Sequence Generator, XML Source Qualifier.
Q. What are
Options/Type to run a Stored Procedure?
A: Normal:
During a session, the stored procedure runs where the
transformation exists in
the mapping on a row-by-row basis. This is useful for calling the stored
procedure for each row of data that passes through the mapping, such as running
a calculation against an input port. Connected stored procedures run only in
normal mode.
Pre-load of the
Source. Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or
performing joins of data in a temporary table.
Post-load of the
Source. After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Pre-load of the
Target. Before the session sends data to the target, the stored procedure
runs. This is useful for verifying target tables or disk space on the target
system.
Post-load of the
Target. After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database. It must contain at
least one Input and one Output port.
Q. What kinds of
sources and of targets can be used in Informatica?
·
Sources may be Flat file, relational db or XML.
·
Target may be relational tables, XML or flat
files.
Q: What is Session
Process?
A: The Load Manager process. Starts the
session, creates the DTM process, and
sends post-session email
when the session completes.
Q. What is DTM
process?
A: The DTM process creates threads to
initialize the session, read, write, transform
data and handle pre and post-session operations.
Q. What is the
different type of tracing levels?
Tracing level
represents the amount of information
that Informatica Server writes in a log file. Tracing levels store
information about mapping and transformations. There are 4 types of tracing
levels supported
1. Normal: It specifies the initialization
and status information and summarization of the success rows and target rows
and the information about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification
of data
3. Verbose
Initialization: In addition to the Normal tracing, specifies the location
of the data cache files and index cache files that are treated and detailed
transformation statistics for each and every transformation within the mapping.
4. Verbose Data: Along with verbose
initialization records each and every record processed by the informatica server.
Q. Types of Dimensions?
A
dimension table consists of the attributes about the facts. Dimensions store
the textual descriptions of the business.
Conformed
Dimension:
Conformed
dimensions mean the exact same thing with every possible fact table to which
they are joined.
Eg:
The date dimension table connected to the sales facts is identical to the date
dimension connected to the inventory facts.
Junk
Dimension:
A
junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension
is simply a structure that provides a convenient place to store the junk
attributes.
Eg:
Assume that we have a gender dimension and marital status dimension. In the
fact table we need to maintain two keys referring to these dimensions. Instead
of that create a junk dimension which has all the combinations of gender and
marital status (cross join gender and marital status table and create a junk
table). Now we can maintain only one key in the fact table.
Degenerated
Dimension:
A
degenerate dimension is a dimension which is derived from the fact table and
doesn’t have its own dimension table.
Eg:
A transactional code in a fact table.
Slowly changing
dimension:
Slowly changing dimensions
are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the
output files that the Informatica server creates during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for
all status and
error messages (default
name: pm.server.log). It also creates an error log for error
messages. These files will
be created in Informatica home directory
Session log file: Informatica server creates session log file for
each session. It writes
information about session
into log files such as initialization process, creation of sql
commands for reader and
writer threads, errors encountered and load summary. The
amount of detail in
session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each
target in mapping.
Session detail includes
information such as table name, number of rows written or
rejected. You can view
this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session
performance
details which helps you
where performance can be improved. To generate this file
select the performance
detail option in the session property sheet.
Reject file: This file contains the rows of data that the
writer does not write to
targets.
Control file: Informatica server creates control file and a
target file when you run a
session that uses the
external loader. The control file contains the information about
the target flat file such
as data format and loading instructions for the external
loader.
Post session email: Post session email allows you to automatically
communicate
information about a
session run to designated recipients. You can create two
different messages. One if
the session completed successfully the other if the session
fails.
Indicator file: If you use the flat file as a target, you can
configure the Informatica
server to create indicator
file. For each target row, the indicator file contains a
number to indicate whether
the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the
Informatica server creates the
target file based on file
properties entered in the session property sheet.
Cache files: When the Informatica server creates memory
cache it also creates cache
files.
For the following
circumstances Informatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant
by lookup caches?
A. The Informatica server
builds a cache in memory when it processes the first row
of a data in a cached look
up transformation. It allocates memory for the cache
based on the amount you
configure in the transformation or session properties. The
Informatica server stores
condition values in the index cache and output values in
the data cache.
Q. How do you
identify existing rows of data in the target table using lookup
transformation?
A. There are two ways to
lookup the target table to verify a row exists or not :
1. Use connect dynamic
cache lookup and then check the values of NewLookuprow
Output port to decide whether
the incoming record already exists in the table / cache
or not.
2. Use Unconnected lookup
and call it from an expression transformation and check
the Lookup condition port
value (Null/ Not Null) to decide whether the incoming
record already exists in the table or not.
Q. What are Aggregate tables?
Aggregate
table contains the summary of existing warehouse data which is grouped to
certain levels of dimensions. Retrieving the required data from the actual
table, which have millions of records will take more time and also affects the
server performance. To avoid this we can aggregate the table to certain
required level and can use it. This tables reduces the load in the database
server and increases the performance of the query and can retrieve the result very
fastly.
Q. What is a level of Granularity of a fact table?
Level
of granularity means level of detail that you put into the fact table in a data
warehouse. For example: Based on design you can decide to put the sales data in
each transaction. Now, level of granularity would mean what detail you are
willing to put for each transactional fact. Product sales with respect to each
minute or you want to aggregate it upto minute and put that data.
Q. What is session?
A
session is a set of instructions to move data from sources to targets.
Q. What is worklet?
Worklet
are objects that represent a set of workflow tasks that allow to reuse a set of
workflow logic in several window.
Use of Worklet: You can bind many of the tasks in
one place so that they can easily get identified and also they can be of a
specific purpose.
Q. What is workflow?
A
workflow is a set of instructions that tells the Informatica server how to
execute the tasks.
Q. Why cannot we use sorted input option for incremental aggregation?
In
incremental aggregation, the aggregate calculations are stored in historical
cache on the server. In this historical cache the data need not be in sorted
order. If you give sorted input, the
records come as presorted for that particular run but in the historical cache
the data may not be in the sorted order. That is why this option is not
allowed.
Q. What is target load
order plan?
You
specify the target loadorder based on source qualifiers in a mapping. If you
have the multiple source qualifiers connected to the multiple targets, you can
designate the order in which informatica server loads data into the targets.
The
Target load Plan defines the order in which data extract from source qualifier
transformation. In Mappings (tab) –
Target Load Order Plan
Q. What is constraint based loading?
Constraint
based load order defines the order of loading the data into the multiple
targets based on primary and foreign keys constraints.
Set
the option is: Double click the session
Configure Object –>
check the Constraint Based Loading
Q. What is the status code in stored procedure transformation?
Status
code provides error handling for the informatica server during the session. The
stored procedure issues a status code that notifies whether or not stored
procedure completed successfully. This value cannot see by the user. It only
used by the informatica server to determine whether to continue running the
session or stop.
Q. Define Informatica Repository?
The
Informatica repository is a relational database that stores information, or
metadata, used by the Informatica Server and Client tools. Metadata can include
information such as mappings describing how to transform source data, sessions
indicating when you want the Informatica Server to perform the transformations,
and connect strings for sources and targets.
The
repository also stores administrative information such as usernames and
passwords, permissions and privileges, and product version.
Use
repository manager to create the repository. The Repository Manager connects to
the repository database and runs the code needed to create the repository
tables. These tables stores metadata in specific format the informatica server,
client tools use.
Q. What is a metadata?
Designing a data mart involves writing and storing a complex set
of instructions. You need to know where to get data (sources), how to change
it, and where to write the information (targets). PowerMart and PowerCenter
call this set of instructions metadata. Each piece of metadata (for example,
the description of a source table in an operational database) can contain
comments about it.
In summary, Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
Q. What is metadata reporter?
It is a web based application that
enables you to run reports against repository metadata. With a Meta data
reporter you can access information about your repository without having
knowledge of sql, transformation language or underlying tables in the
repository.
Q. What are the types of metadata that stores in repository?
Source definitions. Definitions of
database objects (tables, views, synonyms) or files that provide source data.
Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.
Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.
Following are the types of metadata that stores in
the repository
·
Database Connections
·
Global Objects
·
Multidimensional Metadata
·
Reusable Transformations
·
Short cuts
·
Transformations
Q. How can we store previous session logs?
Go to
Session Properties –> Config Object –> Log Options
Select the
properties as follows….
Save session log by –>
SessionRuns
Save session
log for these runs –> Change the number that you want to save the number of
log files (Default is 0)
If you want
to save all of the logfiles created by every run, and then select the option
Save session log for these runs
–> Session TimeStamp
You can
find these properties in the session/workflow Properties.
Q. What is Changed Data Capture?
Changed Data Capture (CDC) helps
identify the data in the source system that has changed since the last
extraction. With CDC data extraction takes place at the same time the insert
update or delete operations occur in the source tables and the change data is
stored inside the database in change tables.
The change data thus captured is then
made available to the target systems in a controlled manner.
Q. What is an indicator file? and how it can be used?
Indicator
file is used for Event Based Scheduling when you don’t know when the Source
Data is available. A shell command, script or a batch file creates and send
this indicator file to the directory local to the Informatica Server. Server
waits for the indicator file to appear before running the session.
Q. What is audit table? and What are the columns in it?
Audit
Table is nothing but the table which contains about your workflow names and
session names. It contains information about workflow and session status and
their details.
·
WKFL_RUN_ID
·
WKFL_NME
·
START_TMST
·
END_TMST
·
ROW_INSERT_CNT
·
ROW_UPDATE_CNT
·
ROW_DELETE_CNT
·
ROW_REJECT_CNT
Q. If
session fails after loading 10000 records in the target, how can we load
10001th record when we run the session in the next time?
Select
the Recovery Strategy in session
properties as “Resume from the last
check point“. Note – Set this property before running the session
Q. Informatica Reject File – How to identify rejection reason
D - Valid data or Good
Data. Writer passes
it to the target database. The target accepts it unless a database error
occurs, such as finding a duplicate key while inserting.
O - Overflowed Numeric
Data. Numeric data
exceeded the specified precision or scale for the column. Bad data, if you
configured the mapping target to reject overflow or truncated data.
N - Null Value. The column contains a null value.
Good data. Writer passes it to the target, which rejects it if the target
database does not accept null values.
T - Truncated String
Data. String data
exceeded a specified precision for the column, so the Integration Service
truncated it. Bad data, if you configured the mapping target to reject overflow
or truncated data.
Also
to be noted that the second column contains column indicator flag value ‘D’
which signifies that the Row Indicator is valid.
Now
let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand
Road Singapore,T
Q. What is “Insert Else Update” and “Update Else Insert”?
These
options are used when dynamic cache is enabled.
·
Insert Else Update option applies to rows entering the
lookup transformation with the row type of insert. When this option is enabled
the integration service inserts new rows in the cache and updates existing
rows. When disabled, the Integration Service does not update existing rows.
·
Update Else Insert option applies to rows entering the
lookup transformation with the row type of update. When this option is enabled,
the Integration Service updates existing rows, and inserts a new row if it is
new. When disabled, the Integration Service does not insert new rows.
Q. What are the Different methods of loading Dimension tables?
Conventional Load - Before loading the data, all the
Table constraints will be checked against the data.
Direct load (Faster Loading) - All the
Constraints will be disabled. Data will be loaded directly. Later the data will
be checked against the table constraints and the bad data won’t be indexed.
Q. What are the different types of Commit intervals?
The
different commit intervals are:
·
Source-based commit. The Informatica Server commits
data based on the number of source rows. The commit point is the commit
interval you configure in the session properties.
·
Target-based commit. The Informatica Server commits
data based on the number of target rows and the key constraints on the target
table. The commit point also depends on the buffer block size and the commit
interval.
Q.
How to add source flat file header into
target file?
Edit Task-->Mapping-->Target-->Header
Options--> Output field names
Q.
How to load name of the file into relation target?
Source Definition-->Properties-->Add
currently processed file name port
Q. How to return multiple columns
through un-connect lookup?
Suppose
your look table has f_name,m_name,l_name and you are using unconnected lookup.
In override SQL of lookup use f_name||~||m_name||~||l_name
you can easily get this value using unconnected lookup in expression. Use
substring function in expression transformation to separate these three columns
and make then individual port for downstream transformation /Target.
-----------------------------------------------------------------------------------------
Q. What is Factless fact table? In
which purpose we are using this in our DWH projects? Plz give me the proper
answer?
It is a fact table which does not contain any measurable data.
EX: Student attendance fact (it contains only Boolean values, whether student attended class or not ? Yes or No.)
A Factless fact table contains only the keys but there is no measures or in other way we can say that it contains no facts. Generally it is used to integrate the fact tables
Factless fact table contains only foreign keys. We can have two kinds of aggregate functions from the factless fact one is count and other is distinct count.
2 purposes of factless fact
1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?
2. Event tracking: To know if the event took place or not.
Like: Fact for tracking student’s attendance will not contain any measures.
Q.
What is staging area?
Staging area is nothing but to apply our logic to extract the data from source and cleansing the data and put the data into meaningful and summaries of the data for data warehouse.
Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the multiple targets based on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like
id name dept row_num
1 aa 4 1
3 cc 4 2
So row num changed and it is an active transformation
2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6
it never restricts the data from any source so the data can
come in any manner
id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6
so
the row_num are changing . Thus we say that union is an active transformation
Q. What is use of batch file in informatica? How many types of batch file in informatica?
With the batch file, we can run sessions either in sequential or in concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another. If u have several independent sessions u can use concurrent batches Which run all the sessions at the same time
Q. What is joiner cache?
When we use the joiner transformation an integration service maintains the cache, all the records are stored in joiner cache. Joiner caches have 2 types of cache 1.Index cache 2. Joiner cache.
Index cache stores all the port values which are participated in the join condition and data cache have stored all ports which are not participated in the join condition.
Q. What is the location of parameter file in Informatica?
$PMBWPARAM
Q. How can you display only hidden files in UNIX
$ ls -la
total 16
8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./
8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../
Correct answer is
ls -a|grep "^\."
$ls -a
Q. How to delete the data in the target table after loaded.
SQ---> Properties tab-->Post SQL
delete from target_tablename
SQL statements executed using the source database connection, after a pipeline is run write post sql in target table as truncate table name. we have the property in session truncate option.
Q. What is polling in informatica? It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the Informatica server.
Q. How i will stop my workflow after 10 errors
Session level property error handling mention condition stop on errors: 10
--->Config object –> Error Handling –> Stop on errors
Q. How can we calculate fact table size?
A fact table is multiple of combination of dimension tables
ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200 stores
3*365*200*200=fact table size
Q. Without using emailtask how will send a mail from
informatica?
by using 'mailx' command in unix of shell scripting
Q. How will compare two mappings in two different
repositories?
in the designer client , goto mapping tab there is one
option that is 'compare', here we will compare two mappings in two different repository
in informatica designer go to mapping tab--->compare..
we can compare 2 folders within the same repository ..
we can compare 2 folders within different repository ..
Q. What is constraint based load order
Constraint based load order defines the order in which data loads into the multiple targets based on primary key and foreign key relationship.
Q. What is target load plan
Suppose i have 3 pipelines in a single mapping designer
emp source--->sq--->tar1
dept source--->sq--->tar2
bonus source--->sq--->tar3
my requirement is to load first in tar2 then tar1 and then finally tar3
for this type of loading to control the extraction of data from source by source qualifier we use target load plan.
Q. What is meant by data driven.. in which scenario we use
that..?
Data driven is available at session level. it says that when we r using update strategy t/r ,how the integration service fetches the data and how to update/insert row in the database log.
Data driven is nothing but instruct the source rows that should take action on target i.e(update,delete,reject,insert). If we use the update strategy transformation in a mapping then will select the data driven option in session.
Q. How to run workflow in unix?
Syntax: pmcmd startworkflow -sv <service name> -d <domain name> -u <user name> -p <password> -f <folder name> <workflow name>
Example
Pmcmd start workflow –service
${INFA_SERVICE} -domain
${INFA_DOMAIN} -uv xxx_PMCMD_ID -pv PSWD -folder
${ETLFolder} -wait ${ETLWorkflow} \
Q. What is the main difference between
a Joiner Transformation and Union Transformation?
Joiner Transformation merge horizontally
Union Transformation merge vertically
A joiner Transformation is used to join data from hertogenous database ie (Sql database and flat file) where has Union transformation is used to join data from
the same relational sources.....(oracle table and another Oracle table)
Join Transformation combines data record horizontally based on join condition.
And combine data from two different sources having different metadata.
Join transformation supports heterogeneous, homogeneous data source.
Union Transformation combines data record vertically from multiple sources, having same metadata.
Union transformation also support heterogeneous data source.
Union transformation functions as UNION ALL set operator.
Q. What is constraint based loading exactly?
And how to do this? I think it is when we have primary key-foreign key
relationship. Is it correct?
Constraint Based Load order defines load the data into multiple targets depend on the primary key foreign key relation.
set the option is: Double click the session
Configure Object check the Constraint Based Loading
Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach?
Top Down approach:-
As per W.H.INWON, first we need to build the Data warehouse after that we need to build up the DataMart but this is so what difficult to maintain the DWH.
Bottom up approach;-
As per Ralph Kimbal, first we need to build up the Data Marts then we need to build up the Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.
Q. What are the different caches used
in informatica?
· Static cache
· Dynamic cache
· Shared cache
· Persistent cache
Q. What is the command to get the list
of files in a directory in unix?
$ls -lrt
Q. How to import multiple flat files in
to single target where there is no common column in the flat files
in workflow session properties in Mapping tab in properties choose Source filetype - Indirect
Give the Source filename : <file_path>
This <file_path> file should contain all the multiple files which you want to Load
Q. How to connect two or more table
with single source qualifier?
Create a Oracle source with how much ever column you want and write the join query in SQL query override. But the column order and data type should be same as in the SQL query.
Q. How to call unconnected lookup in
expression transformation?
:LKP.LKP_NAME(PORTS)
Q. What is diff between connected and
unconnected lookup?
Connected lookup:
It is used to join the two tables
it returns multiple rows
it must be in mapping pipeline
u can implement lookup condition
using connect lookup u can generate sequence numbers by
enabling dynamic lookup cache.
Unconnected lookup:
it returns single output through return port
it acts as a lookup function(:lkp)
it is called by another t/r.
not connected either source r target.
------
CONNECTED LOOKUP:
>> It will participated in data pipeline
>> It contains multiple inputs and multiple outputs.
>> It supported static and dynamic cache.
UNCONNECTED LOOKUP:
>> It will not participated in data pipeline
>> It contains multiple inputs and single output.
>> It supported static cache only.
Q. Types of partitioning in
Informatica?
Partition 5 types
1. Simple pass through
2. Key range
3. Hash
4. Round robin
5. Database
Q. Which transformation uses cache?
1. Lookup transformation
2. Aggregator transformation
3. Rank transformation
4. Sorter transformation
5. Joiner transformation
Q. Explain about union transformation?
A union transformation is a multiple input group transformation, which is used to merge the data from multiple sources similar to UNION All SQL statements to combine the results from 2 or more sql statements.
Similar to UNION All statement, the union transformation doesn't remove duplicate rows. It is an active transformation.
Q. Explain about Joiner transformation?
Joiner transformation is used to join source data from two related heterogeneous sources. However this can also be used to join data from the same source. Joiner t/r join sources with at least one matching column. It uses a condition that matches one or more pair of columns between the 2 sources.
To configure a Joiner t/r various settings that we do are as below:
1) Master and detail source
2) Types of join
3) Condition of the join
Q. Explain about Lookup transformation?
Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym.
The informatica server queries the look up source based on the look up ports in the transformation. It compares look up t/r port values to look up source column values based on the look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.
Q. How to identify this row for insert
and this row for update in dynamic lookup cache?
Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update
Q. How many ways can we implement SCD2?
1) Date range
2) Flag
3) Versioning
Q. How will you check the bottle necks in informatica? From where do you start checking?
You start as per this order
1. Target
2. Source
3. Mapping
4. Session
5. System
Q. What is incremental aggregation?
When the aggregator transformation executes all the output data will get stored in the temporary location called aggregator cache. When the next time the mapping runs the aggregator transformation runs for the new records loaded after the first run. These output values will get incremented with the values in the aggregator cache. This is called incremental aggregation. By this way we can improve performance...
---------------------------
Incremental aggregation means applying only the captured changes in the source to aggregate calculations in a session.
When the source changes only incrementally and if we can capture those changes, then we can configure the session to process only those changes. This allows informatica server to update target table incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. By doing this obviously the session performance increases.
Q. How can i explain my project architecture in interview..? Tell me your project flow from source to target..?
Project architecture is like
1. Source Systems: Like Mainframe,Oracle,People soft,DB2.
2. Landing tables: These are tables act like source. Used for easy to access, for backup purpose, as reusable for other mappings.
3. Staging tables: From landing tables we extract the data into staging tables after all validations done on the data.
4. Dimension/Facts: These are the tables those are used for analysis and make decisions by analyzing the data.
5. Aggregation tables: These tables have summarized data useful for managers who wants to view monthly wise sales, year wise sales etc.
6. Reporting layer: 4 and 5 phases are useful for reporting developers to generate reports. I hope this answer helps you.
Q. What type of transformation is not supported by mapplets?
· Normalizer transformation
· COBOL sources, joiner
· XML source qualifier transformation
· XML sources
· Target definitions
· Pre & Post Session stored procedures
· Other mapplets
Q. How informatica recognizes mapping?
All are organized by Integration service.
Power center talks to Integration Service and Integration service talk to session. Session has mapping Structure. These are flow of Execution.
Q. Can every transformation reusable? How?
Except source qualifier transformation, all transformations support reusable property. Reusable transformation developed in two ways.
1. In mapping which transformation do you want to reuse, select the transformation and double click on it, there you got option like make it as reusable transformation
option. There you need to check the option for converting non reusable to reusable transformation. but except for source qualifier trans.
2. By using transformation developer
Q. What is Pre Sql and Post Sql?
Pre SQL means that the integration service runs SQL commands against the source database before it reads the data from source.
Post SQL means integration service runs SQL commands against target database after it writes to the target.
Q. Insert else update option in which situation we will use?
if the source table contain multiple records .if the record specified in the associated port to insert into lookup cache. it does not find a record in the lookup cache when it is used find the particular record & change the data in the associated port.
----------------------
We set this property when the lookup TRFM uses dynamic cache and the session property TREAT SOURCE ROWS AS "Insert" has been set.
--------------------
This option we use when we want to maintain the history.
If records are not available in target table then it inserts the records in to target and records are available in target table then it updates the records.
Q. What is an incremental loading? in which situations we will use incremental loading?
Incremental Loading is an approach. Let suppose you a mapping for load the data from employee table to a employee_target table on the hire date basis. Again let suppose you already move the employee data from source to target up to the employees hire date 31-12-2009.Your organization now want to load data on employee_target today. Your target already have the data of that employees having hire date up to 31-12-2009.so you now pickup the source data which are hiring from 1-1-2010 to till date. That's why you needn't take the data before than that date, if you do that wrongly it is overhead for loading data again in target which is already exists. So in source qualifier you filter the records as per hire date and you can also parameterized the hire date that help from which date you want to load data upon target.
This is the concept of Incremental loading.
Q. What is target update override?
By Default the integration service updates the target based on key columns. But we might want to update non-key columns also, at that point of time we can override the
UPDATE statement for each target in the mapping. The target override affects only when the source rows are marked as update by an update strategy in the mapping.
Q. What is the Mapping parameter and Mapping variable?
Mapping parameter: Mapping parameter is constant values that can be defined before mapping run. A mapping parameter reuses the mapping for various constant values.
Mapping variable: Mapping variable is represent a value that can be change during the mapping run that can be stored in repository the integration service retrieve that value from repository and incremental value for next run.
Q. What is rank and dense rank in informatica with any examples and give sql query for this both ranks
for eg: the file contains the records with column
100
200(repeated rows)
200
300
400
500
the rank function gives output as
1
2
2
4
5
6
and dense rank gives
1
2
2
3
4
5
for eg: the file contains the records with column
empno sal
100 1000
200(repeated rows) 2000
200 3000
300 4000
400 5000
500 6000
Rank :
select rank() over (partition by empno order by sal) from emp
1
2
2
4
5
6
Dense Rank
select dense_rank() over (partition by empno order by sal) from emp
and dense rank gives
1
2
2
3
4
5
Q. What is the incremental aggregation?
The first time you run an upgraded session using incremental aggregation, the Integration Service upgrades the index and data cache files. If you want to partition a session using a mapping with incremental aggregation, the Integration Service realigns the index and data cache files.
Q. What is session parameter?
Parameter file is a text file where we can define the values to the parameters .session parameters are used for assign the database connection values
Q. What is mapping parameter?
A mapping parameter represents a constant value that can be defined before mapping run. A mapping parameter defines a parameter file which is saved with an extension.prm a mapping parameter reuse the various constant values.
Q. What is parameter file?
A parameter file can be a text file. Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text editor such as word pad or notepad. You can define the following values in parameter file
· Mapping parameters
· Mapping variables
· Session parameters
Q. What is session override?
Session override is an option in informatica at session level. Here we can manually give a sql query which is issued to the database when the session runs. It is nothing but over riding the default sql which is generated by a particular transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. In 8.1.1 we can do all the creation of IS and repository Service, web service, Domain, node, grid ( if we have licensed version),In 8.6.1 the Informatica Admin console we can manage both Domain page and security page. Domain Page means all the above like creation of IS and repository Service, web service, Domain, node, grid ( if we have licensed version) etc. Security page means creation of users, privileges, LDAP configuration, Export Import user and Privileges etc.
Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
---------------------------------
Parameter files are created with an extension of .PRM
These are created to pass values those can be changed for Mapping Parameter and Session Parameter during mapping run.
Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create already in the Mapping with Data Type , Precision and scale.
The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value
After that we have to select the properties Tab of Session and Set Parameter file name including physical path of this xxxx.prm file.
Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
$InputFileValue1=Path of the source Flat file
After that we have to select the properties Tab of Session and Set Parameter file name including physical path of this yyyy.prm file.
Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes values
Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name --------> $InputFileValue1
Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within session mapping determine how to flag records for insert,delete,update,reject. If you do not data driven option setting, the integration service ignores update strategy transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and some limited count, then if it the number of rejected records exceed the count then the
session ends with failed status. This error is called Threshold Error.
Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your lookup.
3) Check the Port CLASS exists as either input port or a variable port in your expression.
4) Check data type of CLASS and the data type of input parameter for your lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to identify a database record without any extraneous data (unique). Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key else are called Alternate Key.
Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female
Account status:Active/Inactive
Btree index is used for unique values.
ex: empid.
Q. What is ThroughPut in Informatica?
Thoughtput is the rate at which power centre server read the rows in bytes from source or writes the rows in bytes into the target per second.
You can find this option in workflow monitor. Right click on session choose properties and Source/Target Statictics tab you can find thoughtput details for each instance of source and target.
Q. What are set operators in Oracle
UNION
UNION ALL
MINUS
INTERSECT
Q. How i can Schedule the Informatica job in "Unix Cron scheduling tool"?
Crontab
The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. To see what crontabs are currently running on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This will open a the default editor (could be vi or pico, if you want you can change the default editor) to let us manipulate the crontab. If you save and exit the editor, all your cronjobs are saved into crontab. Cronjobs are written in the following format:
* * * * * /bin/execute/this/script.sh
Scheduling explained
As you can see there are 5 stars. The stars represent different date parts in the following order:
1. minute (from 0 to 59)
2. hour (from 0 to 23)
3. day of month (from 1 to 31)
4. month (from 1 to 12)
5. day of week (from 0 to 6) (0=Sunday)
Execute every minute
If you leave the star, or asterisk, it means every. Maybe
that's a bit unclear. Let's use the the previous example
again:
* * * * * /bin/execute/this/script.sh
They are all still asterisks! So this means
execute /bin/execute/this/script.sh:
1. every minute
2. of every hour
3. of every day of the month
4. of every month
5. and every day in the week.
In short: This script is being executed every minute.
Without exception.
Execute every Friday 1AM
So if we want to schedule the script to run at 1AM every
Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1. minute: 0
2. of hour: 1
3. of day of month: * (every day of month)
4. of month: * (every month)
5. and weekday: 5 (=Friday)
Execute on weekdays 1AM
So if we want to schedule the script to run at 1AM every Friday, we would need the following cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1. minute: 0
2. of hour: 1
3. of day of month: * (every day of month)
4. of month: * (every month)
5. and weekday: 1-5 (=Monday til Friday)
Execute 10 past after every hour on the 1st of every month
Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.
Q. Can anyone tell me the difference between persistence and dynamic caches? On which conditions we are using these caches?
Dynamic:--
1)When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes rows to the target.
2)In Dynamic, we can update catch will New data also.
3) Dynamic cache, Not Reusable
(when we need Updated cache data, That only we need Dynamic Cache)
Persistent:--
1)a Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server saves or deletes lookup cache files after a successful session based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with New data.
3) Persistent catch is Reusable.
(When we need Previous Cache data, That only we need Persistent Cache)
----------------------------------
few more additions to the above answer.....
1. Dynamic lookup allows modifying cache where as Persistent lookup does not allow us to modify cache.
2. Dynamic lookup use 'newlookup row', a default port in the cache but persistent does use any default ports in cache.
3.As session completes dynamic cache removed but the persistent cache saved in informatica power centre server.
Q. How to obtain performance data for individual transformations?
There is a property at session level “Collect Performance Data“, you can select that property. It gives
you performance details for all the transformations.
Q. List of Active and Passive Transformations in Informatica?
Active Transformation - An active transformation changes the
number of rows that pass through the mapping.
- Source Qualifier Transformation
- Sorter Transformations
- Aggregator Transformations
- Filter Transformation
- Union Transformation
- Joiner Transformation
- Normalizer Transformation
- Rank Transformation
- Router Transformation
- Update Strategy Transformation
- Advanced External Procedure Transformation
Passive Transformation - Passive transformations do not change
the number of rows that pass through the mapping.
- Expression Transformation
- Sequence Generator Transformation
- Lookup Transformation
- Stored Procedure Transformation
- XML Source Qualifier Transformation
- External Procedure Transformation
Q. Eliminating of duplicate records without
using dynamic lookups?
Hi U can eliminate duplicate records by an simple one line SQL Query.
Select id, count (*) from seq1 group by id having count (*)>1;
Below are the ways to eliminate the duplicate records:
1. By enabling the option
in Source Qualifier transformation as select distinct.
2. By enabling the option in
sorter transformation as select distinct.
3. By enabling all the values as group by in Aggregator
transformation.
Q. Can anyone give idea on how do we
perform test load in informatica? What do we test as part of test load in
informatica?
With
a test load, the Informatica Server reads and transforms data without writing
to targets. The Informatica Server does everything, as if running the full
session. The Informatica Server writes data to relational targets, but rolls
back the data when the session completes. So, you can enable collect
performance details property and analyze the how efficient your mapping is. If
the session is running for a long time, you may like to find out the
bottlenecks that are existing. It may be bottleneck of type target, source,
mapping etc.
The
basic idea behind test load is to see the behavior of Informatica Server with your
session.
Q. What is ODS (Operational Data Store)?
A
collection of operation or bases data that is extracted from operation
databases and standardized, cleansed, consolidated, transformed, and loaded
into enterprise data architecture.
An ODS is used to support data mining of
operational data, or as the store for base data that is summarized for a data
warehouse.
The ODS may also be used to audit
the data warehouse to assure summarized and derived data is calculated
properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use the
ODS as there operation databases.
Q. How many tasks are there in informatica?
·
Session
Task
·
Email
Task
·
Command
Task
·
Assignment
Task
·
Control
Task
·
Decision
Task
·
Event-Raise
·
Event-
Wait
·
Timer
Task
·
Link
Task
Q. What are business components in Informatica?
· Domains
· Nodes
· Services
Q. What is versioning?
It’s
used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done
with your changes so that everyone can see those changes.
2. Check out: You check out from the main stream
when you want to make any change to the mapping/workflow.
3. Version history: It will show you all the changes made
and who made it.
Q. Diff between $$$sessstarttime and sessstarttime?
$$$SessStartTime - Returns session start time as a
string value (String datatype)
SESSSTARTTIME - Returns the date along with date timestamp (Date datatype)
SESSSTARTTIME - Returns the date along with date timestamp (Date datatype)
1.
$ Refers
These are the system variables/Session Parameters like $Bad file,$input
file, $output file, $DB connection,$source,$target etc..
2. $$ Refers
User defined variables/Mapping Parameters like $$State,$$Time, $$Entity, $$Business_Date, $$SRC,etc.
These are the system variables/Session Parameters like $Bad file,$input
file, $output file, $DB connection,$source,$target etc..
2. $$ Refers
User defined variables/Mapping Parameters like $$State,$$Time, $$Entity, $$Business_Date, $$SRC,etc.
3.
$$$ Refers
System Parameters like $$$SessStartTime
System Parameters like $$$SessStartTime
$$$SessStartTime
returns the session start time as a string value. The format of the
string depends on the database you are using.
$$$SessStartTime
returns the session start time as a string value --> The format of the
string depends on the database you are using.
Q. Finding Duplicate Rows based on
Multiple Columns?
SELECT firstname, COUNT(firstname), surname, COUNT(surname), email,
COUNT(email) FROM employee
GROUP BY firstname, surname, email
HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND
(COUNT(email) > 1);
Q. Finding Nth Highest Salary in Oracle?
Pick
out the Nth highest salary, say the 4th highest salary.
Select
* from
(select
ename,sal,dense_rank() over (order by sal desc) emp_rank from emp)
where
emp_rank=4;
Q. Find out the third
highest salary?
SELECT MIN(sal) FROM emp WHERE
sal IN (SELECT distinct TOP 3 sal FROM emp ORDER BY sal DESC);
Q. How do you handle error logic in Informatica? What are the transformations that you used while handling errors? How did you reload those error records in target?
Row indicator: It generally happens when working with update strategy transformation. The writer/target rejects the rows going to the target
Column indicator:
D -Valid
o - Overflow
n - Null
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to the target
The reject data is stored on reject files. You can check the data and reload the data in to the target using reject reload utility.
Stop - If the Integration Service is
executing a Session task when you issue the stop command, the Integration
Service stops reading data. It continues processing and writing data and
committing data to targets. If the Integration Service cannot finish processing
and committing data, you can issue the abort command.
Abort - The Integration Service handles
the abort command for the Session task like the stop command, except it
has a timeout period of 60 seconds. If the Integration Service cannot
finish processing and committing data within the timeout period, it kills
the DTM process and terminates the session.
Q. What is inline view?
An
inline view is term given to sub query in FROM clause of query which can be
used as table. Inline view effectively is a named sub query
Ex
: Select Tab1.col1,Tab1.col.2,Inview.col1,Inview.Col2
From Tab1, (Select statement) Inview
Where Tab1.col1=Inview.col1
From Tab1, (Select statement) Inview
Where Tab1.col1=Inview.col1
SELECT
DNAME, ENAME, SAL FROM EMP ,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
In
the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline view.
Inline
views are determined at runtime, and in contrast to normal view they are not
stored in the data dictionary,
Disadvantage
of using this is
1.
Separate view need to be created which is an overhead
2. Extra time taken in parsing of view
2. Extra time taken in parsing of view
This
problem is solved by inline view by using select statement in sub query and
using that as table.
Advantage
of using inline views:
1.
Better query performance
2. Better visibility of code
2. Better visibility of code
Practical
use of Inline views:
1.
Joining Grouped data with non grouped data
2. Getting data to use in another query
2. Getting data to use in another query
Q. What is generated key and generated column id in normalizer transformation?
·
The
integration service increments the generated key (GK) sequence number each time it process a source row. When the
source row contains a multiple-occurring column or a multiple-occurring group
of columns, the normalizer transformation returns a row for each
occurrence. Each row contains the same generated key value.
·
The
normalizer transformation has a generated column ID (GCID) port for each multiple-occurring
column. The GCID is an index for the instance of the multiple-occurring data.
For example, if a column occurs 3 times in a source record, the
normalizer returns a value of 1, 2 or 3 in the generated column ID.
Q. What is difference between SUBSTR and INSTR?
INSTR function
search string for sub-string and returns an integer indicating the position of
the character in string that is the first character of this occurrence.
SUBSTR function
returns a portion of string, beginning at character position, substring_length
characters long. SUBSTR calculates lengths using characters as defined by the
input character set.
Q. What are different Oracle database objects?
·
TABLES
·
VIEWS
·
INDEXES
·
SYNONYMS
·
SEQUENCES
·
TABLESPACES
Q. What is @@ERROR?
The @@ERROR automatic variable returns the error code of the
last Transact-SQL statement. If there was no error, @@ERROR returns zero.
Because @@ERROR is reset after each Transact-SQL statement, it must be
saved to a variable if it is needed to process it further after checking
it.
Q. What is difference between co-related sub query and nested sub query?
Correlated
subquery runs once for each row selected by
the outer query. It contains a reference to a value from the row selected by
the outer query.
Nested
subquery runs only once for the entire nesting
(outer) query. It does not contain any reference to the outer query row.
For
example,
Correlated
Subquery:
Select
e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select
max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested
Subquery:
Select
empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno,
max(basicsal) from emp group by deptno)
Q. How does one escape special characters when building SQL queries?
The
LIKE keyword allows for string searches. The ‘_’ wild card character is used to
match exactly one character, ‘%’ is
used to match zero or more occurrences of any characters. These characters can
be escaped in SQL. Example:
SELECT
name FROM emp WHERE id LIKE ‘%\_%’ ESCAPE ‘\’;
Use
two quotes for every one displayed. Example:
SELECT
‘Franks”s Oracle site’ FROM DUAL;
SELECT ‘A ”quoted” word.’ FROM DUAL;
SELECT ‘A ””double quoted”” word.’ FROM DUAL;
SELECT ‘A ”quoted” word.’ FROM DUAL;
SELECT ‘A ””double quoted”” word.’ FROM DUAL;
Q. Difference between Surrogate key and Primary key?
Surrogate
key:
1.
Query
processing is fast.
2.
It
is only numeric
3.
Developer
develops the surrogate key using sequence generator transformation.
4.
Eg:
12453
Primary
key:
1.
Query
processing is slow
2.
Can
be alpha numeric
3.
Source
system gives the primary key.
4.
Eg:
C10999
Q. How does one eliminate duplicate rows in an Oracle Table?
Method
1:
DELETE
from table_name A
where rowid > (select min(rowid) from table_name B where A.key_values = B.key_values);
where rowid > (select min(rowid) from table_name B where A.key_values = B.key_values);
Method
2:
Create
table table_name2 as select distinct * from table_name1;
drop table table_name1;
rename table table_name2 as table_name1;
drop table table_name1;
rename table table_name2 as table_name1;
In
this method, all the indexes,constraints,triggers etc have to be re-created.
Method
3:
DELETE
from table_name t1
where exists (select ‘x’ from table_name t2 where t1.key_value=t2.key_value and t1.rowid > t2.rowid)
where exists (select ‘x’ from table_name t2 where t1.key_value=t2.key_value and t1.rowid > t2.rowid)
Method
4:
DELETE
from table_name where rowid not in (select max(rowid) from my_table group by
key_value )
Q. Query to retrieve Nth row from an Oracle table?
The
query is as follows:
select
* from my_table where rownum <= n
MINUS
select * from my_table where rownum < n;
MINUS
select * from my_table where rownum < n;
Q. How does the
server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
Q. What are the different types of indexes supported by Oracle?
1. B-tree index
2. B-tree cluster index
3. Hash cluster index
4. Reverse key index
5. Bitmap index
6. Function Based index
Q. Types of Normalizer transformation?
There
are two types of Normalizer transformation.
VSAM
Normalizer transformation
A
non-reusable transformation that is a Source Qualifier transformation for a
COBOL source. The Mapping Designer creates VSAM Normalizer columns from a COBOL
source in a mapping. The column attributes are read-only. The VSAM Normalizer
receives a multiple-occurring source column through one input port.
Pipeline
Normalizer transformation
A
transformation that processes multiple-occurring data from relational tables or
flat files. You might choose this option when you want to process
multiple-occurring data from another transformation in the mapping.
A
VSAM Normalizer transformation has one input port for a multiple-occurring
column. A pipeline Normalizer transformation has multiple input ports for a
multiple-occurring column.
When you create a Normalizer transformation in the Transformation Developer, you create a pipeline Normalizer transformation by default. When you create a pipeline Normalizer transformation, you define the columns based on the data the transformation receives from another type of transformation such as a Source Qualifier transformation.
When you create a Normalizer transformation in the Transformation Developer, you create a pipeline Normalizer transformation by default. When you create a pipeline Normalizer transformation, you define the columns based on the data the transformation receives from another type of transformation such as a Source Qualifier transformation.
The
Normalizer transformation has one output port for each single-occurring input
port.
Q. What are all the transformation you used if source as XML file?
·
XML
Source Qualifier
·
XML
Parser
·
XML
Generator
ls
-lt (sort by last date modified)
ls
–ltr (reverse)
ls
–lS (sort by size of the file)
grep
–v “^$” filename
Email
variable – %a (attach the file) %g – attach session log file
$>
ps –ef
Q. How can i display
only and only hidden file in the current directory?
ls
-a|grep "^\."
#
head -10 logfile
#
tail -10 logfile
1. Run
once – Set 2
parameter date and time when session should start.
2. Run
Every –
Informatica server run session at regular interval as we configured, parameter
Days, hour, minutes, end on, end after, forever.
3. Customized
repeat –
Repeat every 2 days, daily frequency hr, min, every week, every month.
This
feature is similar to entering a custom query in a Source Qualifier
transformation. When entering a Lookup SQL Override, you can enter the entire
override, or generate and edit the default SQL statement.
The
lookup query override can include WHERE clause.
The
Source Qualifier provides the SQL Query option to override the default query.
You can enter any SQL statement supported by your source database. You might
enter your own SELECT statement, or have the database perform aggregate
calculations, or call a stored procedure or stored function to read the data
and perform some tasks.
v_temp
= v_temp+1
o_seq
= IIF(ISNULL(v_temp), 0, v_temp)
Source
–> SQ –> SRT –> EXP –> FLT OR RTR –> TGT
In
Expression:
flag
= Decode(true,eid=pre_eid, ‘Y’,'N’)
flag_out
= flag
pre_eid
= eid
Q. What are the different transaction levels available in transaction control transformation (TCL)?
The
following are the transaction levels or built-in variables:
·
TC_CONTINUE_TRANSACTION: The Integration Service does not
perform any transaction change for this row. This is the default value of the
expression.
·
TC_COMMIT_BEFORE: The Integration Service commits the
transaction, begins a new transaction, and writes the current row to the
target. The current row is in the new transaction.
·
TC_COMMIT_AFTER: The Integration Service writes the
current row to the target, commits the transaction, and begins a new
transaction. The current row is in the committed transaction.
·
TC_ROLLBACK_BEFORE: The Integration Service rolls back
the current transaction, begins a new transaction, and writes the current row
to the target. The current row is in the new transaction.
·
TC_ROLLBACK_AFTER: The Integration Service writes the
current row to the target, rolls back the transaction, and begins a new
transaction. The current row is in the rolled back transaction.
Q. What is difference between grep and find?
Grep is used for finding any string in the file.
Syntax - grep <String> <filename>
Example - grep 'compu' details.txt
Display the whole line,in which line compu string is found.
Find is used to find the file or directory in given path,
Syntax - find <filename>
Example - find compu*
Display all file names starting with computer
Q. What are the difference between DDL, DML and DCL commands?
DDL
is Data Definition Language statements
·
CREATE
– to create objects in the database
·
ALTER
– alters the structure of the database
·
DROP
– delete objects from the database
·
TRUNCATE
– remove all records from a table, including all spaces allocated for the
records are removed
·
COMMENT
– add comments to the data dictionary
·
GRANT
– gives user’s access privileges to database
·
REVOKE
– withdraw access privileges given with the GRANT command
DML
is Data Manipulation Language statements
·
SELECT
– retrieve data from the a database
·
INSERT
– insert data into a table
·
UPDATE
– updates existing data within a table
·
DELETE
– deletes all records from a table, the space for the records remain
·
CALL
– call a PL/SQL or Java subprogram
·
EXPLAIN
PLAN – explain access path to data
·
LOCK
TABLE – control concurrency
DCL
is Data Control Language statements
·
COMMIT
– save work done
·
SAVEPOINT
– identify a point in a transaction to which you can later roll back
·
ROLLBACK
– restore database to original since the last COMMIT
·
SET
TRANSACTION – Change transaction options like what rollback segment to use
Q. What
is Stored Procedure?
A stored procedure is a
named group of SQL statements that have been previously created and stored in
the server database. Stored procedures accept input parameters so that a single
procedure can be used over the network by several clients using different input
data. And when the procedure is modified, all clients automatically get the new
version. Stored procedures reduce network traffic and improve performance.
Stored procedures can be used to help ensure the integrity of the database.
Q. What is View?
A
view is a tailored presentation of the data contained in one or more
tables (or other views). Unlike a table, a view is not allocated any
storage space, nor does a view actually contain data; rather, a view is
defined by a query that extracts or derives data from the tables the view
references. These tables are called base tables.
Views
present a different representation of the data that resides within the base
tables. Views are very powerful because they allow you to tailor
the presentation of data to different types of users.
Views
are often used to:
·
Provide
an additional level of table security by restricting access to
a predetermined set of rows and/or columns of a table
·
Hide
data complexity
·
Simplify
commands for the user
·
Present
the data in a different perspective from that of the base table
·
Isolate
applications from changes in definitions of base tables
·
Express
a query that cannot be expressed without using a view
Q. What
is Trigger?
A trigger is a SQL
procedure that initiates an action when an event (INSERT, DELETE or UPDATE)
occurs. Triggers are stored in and managed by the DBMS. Triggers are used to
maintain the referential integrity of data by changing the data in a systematic
fashion. A trigger cannot be called or executed; the DBMS automatically fires
the trigger as a result of a data modification to the associated table.
Triggers can be viewed as similar to stored procedures in that both consist of
procedural logic that is stored at the database level. Stored procedures,
however, are not event-drive and are not attached to a specific table as
triggers are. Stored procedures are explicitly executed by invoking a CALL to
the procedure while triggers are implicitly executed. In addition, triggers can
also execute stored Procedures.
Nested Trigger: A
trigger can also contain INSERT, UPDATE and DELETE logic within itself, so when
the trigger is fired because of data modification it can also cause another
data modification, thereby firing another trigger. A trigger that contains data
modification logic within itself is called a nested trigger.
Q. What
is View?
A simple view can be
thought of as a subset of a table. It can be used for retrieving data, as well
as updating or deleting rows. Rows updated or deleted in the view are updated
or deleted in the table the view was created with. It should also be noted that
as data in the original table changes, so does data in the view, as views are
the way to look at part of the original table. The results of using a view are
not permanently stored in the database. The data accessed through a view is
actually constructed using standard T-SQL select command and can come from one
to many different base tables or even other views.
Q. What
is Index?
An index is a physical
structure containing pointers to the data. Indices are created in an existing
table to locate rows more quickly and efficiently. It is possible to create an
index on one or more columns of a table, and each index is given a name. The
users cannot see the indexes; they are just used to speed up queries. Effective
indexes are one of the best ways to improve performance in a database
application. A table scan happens when there is no index available to help a
query. In a table scan SQL Server examines every row in the table to satisfy
the query results. Table scans are sometimes unavoidable, but on large tables,
scans have a terrific impact on performance. Clustered indexes define the
physical sorting of a database table’s rows in the storage media. For this
reason, each database table may
have only one clustered
index. Non-clustered indexes are created outside of the database table and
contain a sorted list of references to the table itself.
Q. What
is the difference between clustered and a non-clustered index?
A clustered index is a
special type of index that reorders the way records in the table are physically
stored. Therefore table can have only one clustered index. The leaf nodes of a
clustered index contain the data pages. A nonclustered index is a special type
of index in which the logical order of the index does not match the physical
stored order of the rows on disk. The leaf node of a nonclustered index does
not consist of the data pages. Instead, the leaf nodes contain index rows.
Q. What
is Cursor?
Cursor is a database
object used by applications to manipulate data in a set on a row-by row basis,
instead of the typical SQL commands that operate on all the rows in the set at
one time.
In order to work with a
cursor we need to perform some steps in the following order:
·
Declare cursor
·
Open cursor
·
Fetch row from the cursor
·
Process fetched row
·
Close cursor
·
Deallocate cursor
Q. What
is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
1. Specifies a search
condition for a group or an aggregate. HAVING can be used only with the SELECT
statement.
2. HAVING is typically
used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a
WHERE clause.
3. Having Clause is basically
used only with the GROUP BY function in a query. WHERE Clause is applied to
each row before they are part of the GROUP BY function in a query.
RANK CACHE
Sample Rank Mapping
When
the Power Center Server runs a session with a Rank transformation, it compares
an input row with rows in the data cache. If the input row out-ranks a Stored
row, the Power Center Server replaces the stored row with the input row.
Example: Power Center caches the first
5 rows if we are finding top 5 salaried Employees. When 6th row is read,
it compares it with 5 rows in cache and places it in Cache is needed.
1) RANK INDEX
CACHE:
The
index cache holds group information from the group by ports. If we are Using Group By on DEPTNO, then this cache
stores values 10, 20, 30 etc.
All
Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA
CACHE:
It
holds row data until the Power Center Server completes the ranking and is
generally larger than the index cache. To reduce the data cache size, connect
only the necessary input/output ports to subsequent transformations.
All
Variable ports if there, Rank Port, All ports going out from RANK
Transformations are stored in RANK DATA CACHE.
Example: All
ports except DEPTNO In our mapping example.
Aggregator Caches
1.
The
Power Center Server stores data in the aggregate cache until it completes
Aggregate calculations.
2.
It
stores group values in an index cache and row data in the data cache. If the
Power Center Server requires more space, it stores overflow values in cache
files.
Note: The Power Center Server uses
memory to process an Aggregator transformation with sorted ports. It does not
use cache memory. We do not need to configure cache memory for Aggregator
transformations that use sorted ports.
1) Aggregator Index Cache:
The
index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache
stores values 10, 20, 30 etc.
·
All
Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO
2) Aggregator Data
Cache:
DATA
CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns
in Data Cache:
·
Variable
ports if any
·
Non
group by input/output ports.
·
Non
group by input ports used in non-aggregate output expression.
·
Port
containing aggregate function
JOINER CACHES
Joiner
always caches the MASTER table. We cannot disable caching. It builds Index
cache and Data Cache based on MASTER table.
1) Joiner Index Cache:
All
Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master
column not in join condition and used for output to other transformation or
target table are in Data Cache.
Example: DNAME and LOC in our mapping
example.
Lookup Cache Files
1. Lookup Index
Cache:
Stores
data for the columns used in the lookup condition.
2. Lookup Data
Cache:
·
For
a connected Lookup transformation, stores data for the connected output ports,
not including ports used in the lookup condition.
·
For
an unconnected Lookup transformation, stores data from the return port.
OLTP
and OLAP
Logical Data Modeling
Vs Physical Data Modeling
Router Transformation
And Filter Transformation
Source Qualifier And
Lookup Transformation
Mapping And Mapplet
Joiner Transformation
And Lookup Transformation
Dimension Table and
Fact Table
Connected Lookup and
Unconnected Lookup
Connected Lookup
|
Unconnected Lookup
|
Receives
input values directly from the pipeline.
|
Receives
input values from the result of a :LKP expression in another transformation.
|
We can
use a dynamic or static cache.
|
We can
use a static cache.
|
Cache
includes all lookup columns used in the mapping.
|
Cache
includes all lookup/output ports in the lookup condition and the
lookup/return port.
|
If there
is no match for the lookup condition, the Power Center Server returns the
default value for all output ports.
|
If there
is no match for the lookup condition, the Power Center Server returns NULL.
|
If there
is a match for the lookup condition, the Power Center Server returns the
result of the lookup condition for all lookup/output ports.
|
If there
is a match for the lookup condition, the Power Center Server returns the
result of the lookup condition into the return port.
|
Pass
multiple output values to another transformation.
|
Pass one
output value to another transformation.
|
Supports
user-defined default values
|
Does not
support user-defined default values.
|
Cache Comparison
Persistence and
Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes rows to the target.
1) When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes rows to the target.
2)
In Dynamic, we can update catch will new data also.
3)
Dynamic cache, Not Reusable.
(When
we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server saves or deletes lookup cache files after a successful session based on the Lookup Cache Persistent property.
1) A Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server saves or deletes lookup cache files after a successful session based on the Lookup Cache Persistent property.
2)
Persistent, we are not able to update the catch with new data.
3)
Persistent catch is Reusable.
(When
we need previous cache data, that only we need Persistent Cache)
View And Materialized
View
Star Schema And Snow
Flake Schema
Informatica - Transformations
In Informatica, Transformations help to transform the source
data according to the requirements of target system and it ensures the quality
of the data being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from source to target. (i.e) It eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it (i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from source to target. (i.e) It eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it (i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail.
==============================================================================
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
==============================================================================
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
==============================================================================
Filter Transformation
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and $1000.
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and $1000.
==============================================================================
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.
In order to join two sources, there must be at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.
In order to join two sources, there must be at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
==============================================================================
Lookup transformation
Lookup transformation is Passive and it
can be both Connected and UnConnected as well. It is used to look up data in a
relational table, view, or synonym. Lookup definition can be imported either
from source or from target tables.
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.
Connected lookup receives input values directly from mapping pipeline whereas
Unconnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas
Unconnected lookup does not support user defined values.
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.
Connected lookup receives input values directly from mapping pipeline whereas
Unconnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas
Unconnected lookup does not support user defined values.
==============================================================================
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is stored in de-normalized format.
Also, Normalizer transformation can be used to create multiple rows from a single row of data.
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is stored in de-normalized format.
Also, Normalizer transformation can be used to create multiple rows from a single row of data.
==============================================================================
Rank Transformation
Rank transformation is an Active and Connected transformation.
It is used to select the top or bottom rank of data.
For example,
To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
Rank transformation is an Active and Connected transformation.
It is used to select the top or bottom rank of data.
For example,
To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
==============================================================================
Router Transformation
Router is an Active and Connected transformation. It is similar to filter transformation.
The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions.
It has input, output and default groups.
For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.
Router is an Active and Connected transformation. It is similar to filter transformation.
The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions.
It has input, output and default groups.
For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.
==============================================================================
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
==============================================================================
Sorter Transformation
Sorter transformation is a Connected and an Active transformation.
It allows sorting data either in ascending or descending order according to a specified field.
Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.
Sorter transformation is a Connected and an Active transformation.
It allows sorting data either in ascending or descending order according to a specified field.
Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.
==============================================================================
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
==============================================================================
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected & Unconnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.
Stored Procedure transformation is a Passive and Connected & Unconnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.
==============================================================================
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation.
It is used to update data in target table, either to maintain history of data or recent changes.
You can specify how to treat source rows in table, insert, update, delete or data driven.
Update strategy transformation is an Active and Connected transformation.
It is used to update data in target table, either to maintain history of data or recent changes.
You can specify how to treat source rows in table, insert, update, delete or data driven.
==============================================================================
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation.
XML Source Qualifier is used only with an XML source definition.
It represents the data elements that the Informatica Server reads when it executes a session with XML sources.
XML Source Qualifier is a Passive and Connected transformation.
XML Source Qualifier is used only with an XML source definition.
It represents the data elements that the Informatica Server reads when it executes a session with XML sources.
==============================================================================
Constraint-Based Loading
In the
Workflow Manager, you can specify constraint-based loading for a session. When
you select this option, the Integration Service orders the target load on a
row-by-row basis. For every row generated by an active source, the Integration
Service loads the corresponding transformed row first to the primary key table,
then to any foreign key tables. Constraint-based loading depends on the
following requirements:
Active
source: Related target tables must have the same active source.
Key
relationships: Target tables must have key relationships.
Target
connection groups: Targets must be in one target connection group.
Treat
rows as insert. Use this option when you insert into the target. You
cannot use updates with constraint based loading.
Active Source:
When target
tables receive rows from different active sources, the Integration Service
reverts to normal loading for those tables, but loads all other targets in the
session using constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source, source
qualifier, and target. Since these two targets receive data from different
active sources, the Integration Service reverts to normal loading for both
targets. The third pipeline contains a source, Normalizer, and two targets.
Since these two targets share a single active source (the Normalizer), the
Integration Service performs constraint-based loading: loading the primary key
table first, then the foreign key table.
Key Relationships:
When target
tables have no key relationships, the Integration Service does not perform
constraint-based loading.
Similarly,
when target tables have circular key relationships, the Integration Service
reverts to a normal load. For example, you have one target containing a primary
key and a foreign key related to the primary key in a second target. The second
target also contains a foreign key that references the primary key in the first
target. The Integration Service cannot enforce constraint-based loading for
these tables. It reverts to a normal load.
Target Connection Groups:
The
Integration Service enforces constraint-based loading for targets in the same
target connection group. If you want to specify constraint-based loading for
multiple targets that receive data from the same active source, you must verify
the tables are in the same target connection group. If the tables with the
primary key-foreign key relationship are in different target connection groups,
the Integration Service cannot enforce constraint-based loading when you run
the workflow. To verify that all targets are in the same target connection
group, complete the following tasks:
·
Verify
all targets are in the same target load order group and receive data from the
same active source.
·
Use
the default partition properties and do not add partitions or partition points.
·
Define
the same target type for all targets in the session properties.
·
Define
the same database connection name for all targets in the session properties.
·
Choose
normal mode for the target load type for all targets in the session properties.
Treat Rows as Insert:
Use
constraint-based loading when the session option Treat Source Rows As is set to
insert. You might get inconsistent data if you select a different Treat Source
Rows As option and you configure the session for constraint-based loading.
When the
mapping contains Update Strategy transformations and you need to load data to a
primary key table first, split the mapping using one of the following options:
·
Load
primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.
·
Perform
inserts in one mapping and updates in another mapping.
Constraint-based
loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the Integration Service reads the sources in each
target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together
in a mapping. Constraint based loading establishes the order in which the
Integration Service loads individual targets within a set of targets receiving
data from a single source qualifier.
Example
The
following mapping is configured to perform constraint-based loading:
In the
first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys
referencing the T1 primary key. T_3 has a primary key that T_4 references as a
foreign key.
Since these
tables receive records from a single active source, SQ_A, the Integration Service loads rows to the
target in the following order:
1. T_1
2. T_2 and
T_3 (in no particular order)
3. T_4
The
Integration Service loads T_1 first because it has no foreign key dependencies
and contains a primary key referenced by T_2 and T_3. The Integration Service
then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are
not loaded in any particular order. The Integration Service loads T_4 last, because
it has a foreign key that references a primary key in T_3.After loading the
first set of targets, the Integration Service begins reading source B. If there
are no key relationships between T_5 and T_6, the Integration Service reverts
to a normal load for both targets.
If T_6 has
a foreign key that references a primary key in T_5, since T_5 and T_6 receive
data from a single active source, the Aggregator AGGTRANS, the Integration
Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2,
T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5
and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition
properties. The Integration Service includes T_5 and T_6 in a different target
connection group because they are in a different target load order group from
the first four targets.
Enabling Constraint-Based Loading:
When you
enable constraint-based loading, the Integration Service orders the target load
on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of
the Properties tab, choose Insert for the Treat Source Rows As property.
2. Click the Config Object tab. In the
Advanced settings, select Constraint Based Load Ordering.
3. Click OK.
Target Load Plan
When you
use a mapplet in a mapping, the Mapping Designer lets you set the target load
plan for sources within the mapplet.
Setting the Target Load Order
You can
configure the target load order for a mapping containing any type of target
definition. In the Designer, you can set the order in which the Integration
Service sends rows to targets in different target load order groups in a
mapping. A target load order group is the collection of source qualifiers,
transformations, and targets linked together in a mapping. You can set the
target load order if you want to maintain referential integrity when inserting,
deleting, or updating tables that have the primary key and foreign key
constraints.
The
Integration Service reads sources in a target load order group concurrently,
and it processes target load order groups sequentially.
To specify
the order in which the Integration Service sends data to targets, create one
source qualifier for each target within a mapping. To set the target load
order, you then determine in which order the Integration Service reads each
source in the mapping.
The following figure shows two
target load order groups in one mapping:
In this
mapping, the first target load order group includes ITEMS, SQ_ITEMS, and
T_ITEMS. The second target load order group includes all other objects in the
mapping, including the TOTAL_ORDERS target. The Integration Service processes
the first target load order group, and then the second target load order group.
When it
processes the second target load order group, it reads data from both sources
at the same time.
To set the target load order:
·
Create
a mapping that contains multiple target load order groups.
·
Click
Mappings > Target Load Plan.
·
The
Target Load Plan dialog box lists all Source Qualifier transformations in the
mapping and the targets that receive data from each source qualifier.
·
Select
a source qualifier from the list.
·
Click
the Up and Down buttons to move the source qualifier within the load order.
·
Repeat
steps 3 to 4 for other source qualifiers you want to reorder. Click OK.
Mapping Parameters & Variables
Mapping
parameters and variables represent values in mappings and mapplets.
When we use
a mapping parameter or variable in a mapping, first we declare the mapping
parameter or variable for use in each mapplet or mapping. Then, we define a
value for the mapping parameter or variable before we run the session.
Mapping Parameters
A mapping
parameter represents a constant value that we can define before running a
session.
A mapping
parameter retains the same value throughout the entire session.
Example: When we want to extract records of
a particular month during ETL process, we will create a Mapping Parameter of
data type and use it in query to compare it with the timestamp field in SQL
override.
After we
create a parameter, it appears in the Expression Editor.
We can then
use the parameter in any expression in the mapplet or mapping.
We can also
use parameters in a source qualifier filter, user-defined join, or extract
override, and in the Expression Editor of reusable transformations.
Mapping Variables
Unlike
mapping parameters, mapping variables are values that can change between
sessions.
·
The
Integration Service saves the latest value of a mapping variable to the
repository at the end of each successful session.
·
We
can override a saved value with the parameter file.
·
We
can also clear all saved values for the session in the Workflow Manager.
We might
use a mapping variable to perform an incremental read of the source. For
example, we have a source table containing time stamped transactions and we
want to evaluate the transactions on a daily basis. Instead of manually
entering a session override to filter source data each time we run the session,
we can create a mapping variable, $$IncludeDateTime. In the source qualifier,
create a filter to read only rows whose transaction date equals
$$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the
mapping, use a variable function to set the variable value to increment one day
each time the session runs. If we set the initial value of $$IncludeDateTime to
8/1/2004, the first time the Integration Service runs the session, it reads
only rows dated 8/1/2004. During the session, the Integration Service sets
$$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end
of the session. The next time it runs the session, it reads only rows from
August 2, 2004.
Used in following transformations:
Expression
Filter
Router
Update
Strategy
Initial and Default Value:
When we
declare a mapping parameter or variable in a mapping or a mapplet, we can enter
an initial value. When the Integration Service needs an initial value, and we
did not declare an initial value for the parameter or variable, the Integration
Service uses a default value based on the data type of the parameter or
variable.
Data
->Default Value
Numeric
->0
String
->Empty String
Date time
->1/1/1
Variable
Values: Start value and current value of a mapping variable
Start Value:
The start
value is the value of the variable at the start of the session. The Integration
Service looks for the start value in the following order:
·
Value
in parameter file
·
Value
saved in the repository
·
Initial
value
·
Default
value
Current Value:
The current
value is the value of the variable as the session progresses. When a session
starts, the current value of a variable is the same as the start value. The
final current value for a variable is saved to the repository at the end of a
successful session. When a session fails to complete, the Integration Service
does not update the value of the variable in the repository.
Note: If a variable function is not
used to calculate the current value of a mapping variable, the start value of
the variable is saved to the repository.
Variable
Data type and Aggregation Type When we declare a mapping variable in a mapping,
we need to configure the Data type and aggregation type for the variable. The
IS uses the aggregate type of a Mapping variable to determine the final current
value of the mapping variable.
Aggregation types are:
Count: Integer and small integer
data types are valid only.
Max: All transformation data types
except binary data type are valid.
Min: All transformation data types
except binary data type are valid.
Variable Functions
Variable
functions determine how the Integration Service calculates the current value of
a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the
maximum value of a group of values. It ignores rows marked for update, delete,
or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the
minimum value of a group of values. It ignores rows marked for update, delete, or
reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value
by one. It adds one to the variable value when a row is marked for insertion,
and subtracts one when the row is Marked for deletion. It ignores rows marked
for update or reject. Aggregation type set to Count.
SetVariable: Sets the variable to the
configured value. At the end of a session, it compares the final current value
of the variable to the start value of the variable. Based on the aggregate type
of the variable, it saves a final value to the repository.
Creating Mapping Parameters and
Variables
·
Open
the folder where we want to create parameter or variable.
·
In
the Mapping Designer, click Mappings > Parameters and Variables. -or- In the
Mapplet Designer, click Mapplet > Parameters and Variables.
·
Click
the add button.
·
Enter
name. Do not remove $$ from name.
·
Select
Type and Data type. Select Aggregation type for mapping variables.
·
Give
Initial Value. Click ok.
·
Example:
Use of Mapping of Mapping Parameters and Variables
·
EMP
will be source table.
Create a
target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL,
MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL =
SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month)
SET_VAR: We
will be added one month to the HIREDATE of every employee.
Create
shortcuts as necessary.
Creating Mapping
1. Open folder where we want to create
the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give
name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create ->
Select Expression for list -> Create –> Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL,
COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give
initial value as 200.
8. Create variable $$var_max of MAX
aggregation type and initial value 1500.
9. Create variable $$var_min of MIN
aggregation type and initial value 1500.
10.
Create
variable $$var_count of COUNT aggregation type and initial value 0. COUNT is
visible when datatype is INT or SMALLINT.
11.
Create
variable $$var_set of MAX aggregation type.
12. Create
5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,
out_COUNT_VAR
and out_SET_VAR.
13. Open
expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM.
To add $$BONUS to it, select variable tab and select the parameter from mapping
parameter. SAL + COMM + $$Bonus
14. Open
Expression editor for out_max_var.
15. Select
the variable function SETMAXVARIABLE from left side pane. Select
$$var_max
from variable tab and SAL from ports tab as shown below. SETMAXVARIABLE($$var_max,SAL)
17. Open
Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL).
Validate the expression.
18. Open
Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count).
Validate the expression.
19. Open
Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)).
Validate.
20. Click
OK. Expression Transformation below:
21. Link
all ports from expression to target and Validate Mapping and Save it.
22. See
mapping picture on next page.
PARAMETER FILE
A parameter
file is a list of parameters and associated values for a workflow, worklet, or
session.
Parameter
files provide flexibility to change these variables each time we run a workflow
or session.
We can
create multiple parameter files and change the file we use for a session or
workflow. We can create a parameter file using a text editor such as WordPad or
Notepad.
Enter the
parameter file name and directory in the workflow or session properties.
A parameter
file contains the following types of parameters and variables:
Workflow variable: References values and records
information in a workflow.
Worklet variable: References values and records
information in a worklet. Use predefined worklet variables in a parent
workflow, but we cannot use workflow variables from the parent workflow in a
worklet.
Session parameter: Defines a value that can
change from session to session, such as a database connection or file name.
Mapping
parameter and Mapping variable
USING A PARAMETER FILE
Parameter
files contain several sections preceded by a heading. The heading identifies
the Integration Service, Integration Service process, workflow, worklet, or
session to which we want to assign parameters or variables.
·
Make
session and workflow.
·
Give
connection information for source and target table.
·
Run
workflow and see result.
Sample Parameter File for Our
example:
In the
parameter file, folder and session names are case sensitive.
Create a
text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can
specify the parameter file name and directory in the workflow or session
properties.
To enter a
parameter file in the workflow properties:
1. Open a
Workflow in the Workflow Manager.
2. Click
Workflows > Edit.
3. Click
the Properties tab.
4. Enter
the parameter directory and name in the Parameter Filename field.
5. Click
OK.
To enter a parameter file in the
session properties:
1. Open a
session in the Workflow Manager.
2. Click
the Properties tab and open the General Options settings.
3. Enter
the parameter directory and name in the Parameter Filename field.
4. Example:
D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click
OK.
Mapplet
A mapplet
is a reusable object that we create in the Mapplet Designer.
It contains
a set of transformations and lets us reuse that transformation logic in multiple
mappings.
Created in Mapplet Designer in
Designer Tool.
We need to
use same set of 5 transformations in say 10 mappings. So instead of making 5
transformations in every 10 mapping, we create a mapplet of these 5
transformations. Now we use this mapplet in all 10 mappings. Example: To create
a surrogate key in target. We create a mapplet using a stored procedure to
create Primary key for target table. We give target table name and key column
name as input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in
the following ways:
Include source definitions: Use multiple source definitions
and source qualifiers to provide source data for a mapping.
Accept data
from sources in a mapping
Include multiple transformations: As many transformations as we
need.
Pass data to multiple
transformations: We
can create a mapplet to feed data to multiple transformations. Each Output
transformation in a mapplet represents one output group in a mapplet.
Contain unused ports: We do not have to connect all
mapplet input and output ports in a mapping.
Mapplet Input:
Mapplet
input can originate from a source definition and/or from an Input
transformation in the mapplet. We can create multiple pipelines in a mapplet.
We use
Mapplet Input transformation to give input to mapplet.
Use of
Mapplet Input transformation is optional.
Mapplet Output:
The output
of a mapplet is not connected to any target table.
We must use
Mapplet Output transformation to store mapplet output.
A mapplet
must contain at least one Output transformation with at least one connected
port in the mapplet.
Example1: We will join EMP and DEPT
table. Then calculate total salary. Give the output to mapplet out
transformation.
· EMP and
DEPT will be source tables.
· Output
will be given to transformation Mapplet_Out.
Steps:
·
Open
folder where we want to create the mapping.
·
Click
Tools -> Mapplet Designer.
·
Click
Mapplets-> Create-> Give name. Ex: mplt_example1
·
Drag
EMP and DEPT table.
·
Use
Joiner transformation as described earlier to join them.
·
Transformation
-> Create -> Select Expression for list -> Create -> Done
·
Pass
all ports from joiner to expression and then calculate total salary as
described in expression transformation.
·
Now
Transformation -> Create -> Select Mapplet Out from list –> Create
-> Give name and then done.
·
Pass
all ports from expression to Mapplet output.
·
Mapplet
-> Validate
·
Repository
-> Save
Use of mapplet in mapping:
We can
mapplet in mapping by just dragging the mapplet from mapplet folder on left
pane as we drag source and target tables.
When we use
the mapplet in a mapping, the mapplet object displays only the ports from the
Input and Output transformations. These are referred to as the mapplet input
and mapplet output ports.
Make sure
to give correct connection information in session.
Making a mapping: We will use mplt_example1,
and then create a filter
transformation
to filter records whose Total Salary is >= 1500.
·
mplt_example1
will be source.
·
Create
target table same as Mapplet_out transformation as in picture above. Creating
Mapping
Open folder
where we want to create the mapping.
·
Click
Tools -> Mapping Designer.
·
Click
Mapping-> Create-> Give name. Ex: m_mplt_example1
·
Drag
mplt_Example1 and target table.
·
Transformation
-> Create -> Select Filter for list -> Create -> Done.
·
Drag
all ports from mplt_example1 to filter and give filter condition.
·
Connect
all ports from filter to target. We can add more transformations after filter
if needed.
·
Validate
mapping and Save it.
Make
session and workflow.
Give
connection information for mapplet source tables.
Give
connection information for target table.
Run
workflow and see result.
Indirect Loading For Flat Files
Suppose,
you have 10 flat files of same structure. All the flat files have same number
of columns and data type. Now we need to transfer all the 10 files to same
target.
Names of
files are say EMP1, EMP2 and so on.
Solution1:
1. Import
one flat file definition and make the mapping as per need.
2. Now in
session give the Source File name and Source File Directory location of one
file.
3. Make
workflow and run.
4. Now open
session after workflow completes. Change the Filename and Directory to give
information of second file. Run workflow again.
5. Do the
above for all 10 files.
Solution2:
1. Import
one flat file definition and make the mapping as per need.
2. Now in
session give the Source Directory location of the files.
3. Now in
Fieldname use $InputFileName. This is a session parameter.
4. Now make
a parameter file and give the value of $InputFileName.
$InputFileName=EMP1.txt
5. Run the
workflow
6. Now edit
parameter file and give value of second file. Run workflow again.
7. Do same
for remaining files.
Solution3:
1. Import
one flat file definition and make the mapping as per need.
2. Now make
a notepad file that contains the location and name of each 10 flat files.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt
and so on
3. Now make
a session and in Source file name and Source File Directory location fields,
give the name and location of above created file.
4. In
Source file type field, select Indirect.
5. Click
Apply.
6. Validate
Session
7. Make
Workflow. Save it to repository and run.
Incremental Aggregation
When we enable the session
option-> Incremental
Aggregation the Integration Service performs incremental aggregation, it passes
source data through the mapping and uses historical cache data to perform
aggregation calculations incrementally.
When using
incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes incrementally and you can
capture changes, you can configure the session to process those changes. This
allows the Integration Service to update the target incrementally, rather than
forcing it to process the entire source and recalculate the same data each time
you run the session.
For
example, you might have a session using a source that receives new data every
day. You can capture those incremental changes because you have added a filter
condition to the mapping that removes pre-existing data from the flow of data.
You then enable incremental aggregation.
When the
session runs with incremental aggregation enabled for the first time on March
1, you use the entire source. This allows the Integration Service to read and
store the necessary aggregate data. On March 2, when you run the session again,
you filter out all the records except those time-stamped March 2. The
Integration Service then processes the new data and updates the target accordingly.
Consider using incremental aggregation in the following circumstances:
You can
capture new source data. Use incremental aggregation when you can capture
new source data each time you run the session. Use a Stored Procedure or Filter
transformation to process new data.
Incremental
changes do not significantly change the target. Use incremental
aggregation when the changes do not significantly change the target. If
processing the incrementally changed source alters more than half the existing
target, the session may not benefit from using incremental aggregation. In this
case, drop the table and recreate the target with complete source data.
Note: Do not use incremental
aggregation if the mapping contains percentile or median functions. The
Integration Service uses system memory to process these functions in addition
to the cache memory you configure in the session properties. As a result, the
Integration Service does not store incremental aggregation values for
percentile and median functions in disk caches.
Integration Service Processing for
Incremental Aggregation
(i)The
first time you run an incremental aggregation session, the Integration Service
processes the entire source. At the end of the session, the Integration Service
stores aggregate data from that session run in two files, the index file and
the data file. The Integration Service creates the files in the cache directory
specified in the Aggregator transformation properties.
(ii)Each
subsequent time you run the session with incremental aggregation, you use the
incremental source changes in the session. For each input record, the
Integration Service checks historical information in the index file for a
corresponding group. If it finds a corresponding group, the Integration Service
performs the aggregate operation incrementally, using the aggregate data for
that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the
record data.
(iii)When
writing to the target, the Integration Service applies the changes to the
existing target. It saves modified aggregate data in the index and data files
to be used as historical data the next time you run the session.
(iv) If the
source changes significantly and you want the Integration Service to continue
saving aggregate data for future incremental changes, configure the Integration
Service to overwrite existing aggregate data with new aggregate data.
Each
subsequent time you run a session with incremental aggregation, the Integration
Service creates a backup of the incremental aggregation files. The cache
directory for the Aggregator transformation must contain enough disk space for
two sets of the files.
(v)When you
partition a session that uses incremental aggregation, the Integration Service
creates one set of cache files for each partition.
The
Integration Service creates new aggregate data, instead of using historical
data, when you perform one of the following tasks:
·
Save
a new version of the mapping.
·
Configure
the session to reinitialize the aggregate cache.
·
Move
the aggregate files without correcting the configured path or directory for the
files in the session properties.
·
Change
the configured path or directory for the aggregate files without moving the
files to the new location.
·
Delete
cache files.
·
Decrease
the number of partitions.
·
When
the Integration Service rebuilds incremental aggregation files, the data in the
previous files is lost.
Note: To
protect the incremental aggregation files from file corruption or disk failure,
periodically back up the files.
Preparing for Incremental
Aggregation:
When you
use incremental aggregation, you need to configure both mapping and session
properties:
·
Implement
mapping logic or filter to remove pre-existing data.
·
Configure
the session for incremental aggregation and verify that the file directory has
enough disk space for the aggregate files.
Configuring the Mapping
Before
enabling incremental aggregation, you must capture changes in source data. You
can use a Filter or Stored Procedure transformation in the mapping to remove
pre-existing source data during a session.
Configuring the Session
Use the
following guidelines when you configure the session for incremental
aggregation:
(i) Verify the location where
you want to store the aggregate files.
The index
and data files grow in proportion to the source data. Be sure the cache
directory has enough disk space to store historical data for the session.
When you
run multiple sessions with incremental aggregation, decide where you want the
files stored. Then, enter the appropriate directory for the process variable,
$PMCacheDir, in the Workflow Manager. You can enter session-specific
directories for the index and data files. However, by using the process
variable for all sessions using incremental aggregation, you can easily change
the cache directory when necessary by changing $PMCacheDir.
Changing
the cache directory without moving the files causes the Integration Service to
reinitialize the aggregate cache and gather new aggregate data.
In a grid,
Integration Services rebuild incremental aggregation files they cannot find.
When an Integration Service rebuilds incremental aggregation files, it loses
aggregate history.
(ii) Verify the incremental
aggregation settings in the session properties.
You can
configure the session for incremental aggregation in the Performance settings
on the Properties tab.
You can
also configure the session to reinitialize the aggregate cache. If you choose
to reinitialize the cache, the Workflow Manager displays a warning indicating
the Integration Service overwrites the existing cache and a reminder to clear
this option after running the session.
Scenarios for choosing hash partitioning:
• Not enough knowledge about how much data maps into a give range.
• Sizes of range partition differ quite substantially, or are difficult to balance manually
• Range partitioning would cause data to be clustered undesirably.
• Features such as parallel DML, partition pruning, joins etc are important.
• Not enough knowledge about how much data maps into a give range.
• Sizes of range partition differ quite substantially, or are difficult to balance manually
• Range partitioning would cause data to be clustered undesirably.
• Features such as parallel DML, partition pruning, joins etc are important.
You Can Define Following Partition Types In Workflow Manager:
1) Database Partitioning
The integration service queries the IBM db2 or oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with oracle or IBM db2 source instances on a multi-node table space. Use database partitioning with db2 targets
2) Hash Partitioning
Use hash partitioning when you want the integration service to distribute rows to the partitions by group. For example, you need to sort items by item id, but you do not know how many items have a particular id number
3) Key Range
you specify one or more ports to form a compound partition key. The integration service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
4) Simple Pass-Through
The integration service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions
5) Round-Robin
The integration service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
1) Database Partitioning
The integration service queries the IBM db2 or oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with oracle or IBM db2 source instances on a multi-node table space. Use database partitioning with db2 targets
2) Hash Partitioning
Use hash partitioning when you want the integration service to distribute rows to the partitions by group. For example, you need to sort items by item id, but you do not know how many items have a particular id number
3) Key Range
you specify one or more ports to form a compound partition key. The integration service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
4) Simple Pass-Through
The integration service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions
5) Round-Robin
The integration service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
Partition Types
Overview
Creating
Partition Tables
To create a partition table gives
the following statement
Create table sales (year number(4),
product varchar2(10),
amt number(10))
partition by range (year)
(
partition p1 values less than (1992) ,
partition p2 values less than (1993),
partition p5 values less than (MAXVALUE)
);
The following example creates a table with list
partitioning
Create table
customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition
north_India values (‘DELHI’,’CHANDIGARH’),
Partition
east_India values (‘KOLKOTA’,’PATNA’),
Partition
south_India values (‘HYDERABAD’,’BANGALORE’,
’CHENNAI’),
Partition west
India values (‘BOMBAY’,’GOA’);
alter table sales add partition p6 values less than (1996);
alter table customers add partition central_India values (‘BHOPAL’,’NAGPUR’);SSS
Alter table sales drop partition p5;
Alter
table sales merge partition p2 and p3 into
partition p23;
The
following statement adds a new set of cities (
'KOCHI',
'MANGALORE'
)
to an existing partition list. ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');
The
statement below drops a set of cities (‘
KOCHI'
and 'MANGALORE'
) from an existing
partition value list. ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (‘KOCHI’,’MANGALORE’);
SPLITTING PARTITIONS
You can split a single partition into two partitions. For example to split the partition p5 of sales table into two partitions give the following command.
Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the following statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;
TASKS
The
Workflow Manager contains many types of tasks to help you build workflows and
worklets. We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
|
Tool where task can be created
|
Reusable or not
|
Session
|
Task
Developer
|
Yes
|
Email
|
Workflow
Designer
|
Yes
|
Command
|
Worklet
Designer
|
Yes
|
Event-Raise
|
Workflow
Designer
|
No
|
Event-Wait
|
Worklet
Designer
|
No
|
Timer
|
|
No
|
Decision
|
|
No
|
Assignment
|
|
No
|
Control
|
|
No
|
SESSION TASK
A session
is a set of instructions that tells the Power Center Server how and when to
move data from sources to targets.
To run a
session, we must first create a workflow to contain the Session task.
We can run
as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power
Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.
EMAIL TASK
The
Workflow Manager provides an Email task that allows us to send email during a
workflow.
Created by
Administrator usually and we just drag and use it in our mapping.
Steps:
1. In the Task Developer or Workflow
Designer, choose Tasks-Create.
2. Select an Email task and enter a
name for the task. Click Create.
3. Click Done.
4. Double-click the Email task in the
workspace. The Edit Tasks dialog box appears.
5. Click the Properties tab.
6. Enter the fully qualified email
address of the mail recipient in the Email User Name field.
7. Enter the subject of the email in
the Email Subject field. Or, you can leave this field blank.
8. Click the Open button in the Email
Text field to open the Email Editor.
9. Click OK twice to save your changes.
Example: To send an email when
a session completes:
Steps:
1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to
Components tab.
4. See On Success Email Option there
and configure it.
5. In Type select reusable or
Non-reusable.
6. In Value, select the email task to
be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository
-> Save
9. We can also drag the email task and
use as per need.
10.
We
can set the option to send email on success or failure in components tab of a
session task.
COMMAND TASK
The Command
task allows us to specify one or more shell commands in UNIX or DOS commands in
Windows to run during the workflow.
For
example, we can specify shell commands in the Command task to delete reject
files, copy a file, or archive target files.
Ways of
using command task:
1. Standalone
Command task: We can use a Command task anywhere in the workflow or
worklet to run shell commands.
2. Pre- and
post-session shell command: We can call a Command task as the pre- or
post-session shell command for a Session task. This is done in COMPONENTS TAB
of a session. We can run it in Pre-Session Command or Post Session Success
Command or Post Session Failure Command. Select the Value and Type option as we
did in Email task.
Example: to copy a file sample.txt from D
drive to E.
Command: COPY
D:\sample.txt E:\ in windows
Steps for creating command task:
1. In the Task Developer or Workflow
Designer, choose Tasks-Create.
2. Select Command Task for the task
type.
3. Enter a name for the Command task.
Click Create. Then click done.
4. Double-click the Command task. Go to
commands tab.
5. In the Commands tab, click the Add
button to add a command.
6. In the Name field, enter a name for
the new command.
7. In the Command field, click the Edit
button to open the Command Editor.
8. Enter only one command in the
Command Editor.
9. Click OK to close the Command
Editor.
10.
Repeat
steps 5-9 to add more commands in the task.
11.
Click
OK.
Steps to
create the workflow using command task:
1. Create a task using the above steps
to copy a file in Task Developer.
2. Open Workflow Designer. Workflow
-> Create -> Give name and click ok.
3. Start is displayed. Drag session say
s_m_Filter_example and command task.
4. Link Start to Session task and
Session to Command Task.
5. Double click link between Session
and Command and give condition in editor as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository –> Save
WORKING WITH EVENT TASKS
We can
define events in the workflow to specify the sequence of task execution.
Types of Events:
Pre-defined event: A pre-defined event is a
file-watch event. This event Waits for a specified file to arrive at a given
location.
User-defined event: A user-defined event is a
sequence of tasks in the Workflow. We create events and then raise them as per
need.
Steps for
creating User Defined Event:
1. Open any workflow where we want to
create an event.
2. Click Workflow-> Edit ->
Events tab.
3. Click to Add button to add events
and give the names as per need.
4. Click Apply -> Ok. Validate the
workflow and Save it.
Types of Events Tasks:
EVENT RAISE: Event-Raise task represents a
user-defined event. We use this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a
file watcher event or user defined event to occur before executing the next
session in the workflow.
Example1: Use
an event wait task and make sure that session s_filter_example runs when
abc.txt file is present in D:\FILES folder.
Steps for
creating workflow:
1. Workflow -> Create -> Give
name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event
Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace
and link it to event wait task.
5. Right click on event wait task and
click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In
the blank space, give directory and filename to watch. Example:
D:\FILES\abc.tct
7. Workflow validate and Repository
Save.
Example
2: Raise a user defined event when session s_m_filter_example succeeds.
Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for
creating workflow:
1. Workflow -> Create -> Give
name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab
and add events EVENT1 there.
3. Drag s_m_filter_example and link it
to START task.
4. Click Tasks -> Create ->
Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then
done. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT
-> Properties Tab -> Open Value for User Defined Event and Select EVENT1
from the list displayed. Apply -> OK.
7. Click link between ER_Example and
s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create ->
Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done.
9. Link EW_WAIT to START task.
10.
Right
click EW_WAIT -> EDIT-> EVENTS tab.
11.
Select
User Defined there. Select the Event1 by clicking Browse Events button.
12.
Apply
-> OK.
13.
Drag
S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14.
Mapping
-> Validate
15.
Repository
-> Save.
Run
workflow and see.
TIMER TASK
The Timer
task allows us to specify the period of time to wait before the Power Center
Server runs the next task in the workflow. The Timer task has two types of
settings:
Absolute time: We specify the exact date and
time or we can choose a user-defined workflow variable to specify the exact
time. The next task in workflow will run as per the date and time specified.
Relative time: We instruct the Power Center
Server to wait for a specified period of time after the Timer task, the parent
workflow, or the top-level workflow starts.
Example: Run
session s_m_filter_example relative to 1 min after the timer task.
Steps for
creating workflow:
1. Workflow -> Create -> Give
name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create ->
Select TIMER from list. Give name TIMER_Example. Click Create and then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT
-> TIMER tab.
5. Select Relative Time Option and Give
1 min and Select ‘From start time of this task’ Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it
to TIMER_Example.
8. Workflow-> Validate and
Repository -> Save.
DECISION TASK
The
Decision task allows us to enter a condition that determines the execution of
the workflow, similar to a link condition.
The
Decision task has a pre-defined variable called $Decision_task_name.condition
that represents the result of the decision condition.
The Power
Center Server evaluates the condition in the Decision task and sets the
pre-defined condition variable to True (1) or False (0).
We can
specify one decision condition per Decision task.
Example: Command
Task should run only if either s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE
succeeds. If any of s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE
fails then S_m_sample_mapping_EMP should run.
Steps for
creating workflow:
1. Workflow -> Create -> Give
name wf_decision_task_example -> Click ok.
2. Drag s_m_filter_example and
S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task.
3. Click Tasks -> Create ->
Select DECISION from list. Give name DECISION_Example. Click Create and then
done. Link DECISION_Example to both s_m_filter_example and
S_M_TOTAL_SAL_EXAMPLE.
4. Right click DECISION_Example->
EDIT -> GENERAL tab.
5. Set ‘Treat Input Links As’ to OR.
Default is AND. Apply and click OK.
6. Now edit decision task again and go
to PROPERTIES Tab. Open the Expression editor by clicking the VALUE section of
Decision Name attribute and enter the following condition:
$S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status =
SUCCEEDED
7. Validate the condition -> Click
Apply -> OK.
8. Drag command task and
S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example
task.
9. Double click link between
S_m_sample_mapping_EMP & DECISION_Example & give the condition:
$DECISION_Example.Condition = 0. Validate & click OK.
10.
Double
click link between Command task and DECISION_Example and give the condition:
$DECISION_Example.Condition = 1. Validate and click OK.
11.
Workflow
Validate and repository Save.
Run
workflow and see the result.
CONTROL TASK
We can use
the Control task to stop, abort, or fail the top-level workflow or the parent
workflow based on an input link condition.
A parent
workflow or worklet is the workflow or worklet that contains the Control task.
We give the
condition to the link connected to Control Task.
Control Option
|
Description
|
Fail Me
|
Fails the
control task.
|
Fail
Parent
|
Marks the
status of the WF or worklet that contains the
Control
task as failed.
|
Stop
Parent
|
Stops the
WF or worklet that contains the Control task.
|
Abort
Parent
|
Aborts
the WF or worklet that contains the Control task.
|
Fail
Top-Level WF
|
Fails the
workflow that is running.
|
Stop
Top-Level WF
|
Stops the
workflow that is running.
|
Abort
Top-Level WF
|
Aborts
the workflow that is running.
|
Example: Drag
any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for
creating workflow:
1. Workflow -> Create -> Give
name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and
link all of them to START task.
3. Click Tasks -> Create ->
Select CONTROL from list. Give name cntr_task.
4. Click Create and then done.
5. Link all sessions to the control
task cntr_task.
6. Double click link between cntr_task
and any session say s_m_filter_example and give the condition:
$S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2
sessions also.
8. Right click cntr_task-> EDIT
-> GENERAL tab. Set ‘Treat Input Links As’ to OR. Default is AND.
9. Go to PROPERTIES tab of cntr_task
and select the value ‘Fail top level
10.
Workflow’
for Control Option. Click Apply and OK.
11.
Workflow
Validate and repository Save.
Run
workflow and see the result.
ASSIGNMENT TASK
The
Assignment task allows us to assign a value to a user-defined workflow
variable.
See
Workflow variable topic to add user defined variables.
·
To
use an Assignment task in the workflow, first create and add the
·
Assignment
task to the workflow. Then configure the Assignment task to assign values or
expressions to user-defined variables.
·
We
cannot assign values to pre-defined workflow.
Steps to
create Assignment Task:
1. Open any workflow where we want to
use Assignment task.
2. Edit Workflow and add user defined
variables.
3. Choose Tasks-Create. Select
Assignment Task for the task type.
4. Enter a name for the Assignment
task. Click Create. Then click done.
5. Double-click the Assignment task to
open the Edit Task dialog box.
6. On the Expressions tab, click Add to
add an assignment.
7. Click the Open button in the User
Defined Variables field.
8. Select the variable for which you
want to assign a value. Click OK.
9. Click the Edit button in the
Expression field to open the Expression Editor.
10.
Enter
the value or expression you want to assign.
11.
Repeat
steps 7-10 to add more variable assignments as necessary.
12.
Click
OK.
Scheduler
We can schedule a workflow
to run continuously, repeat at a given time or interval, or we can manually
start a workflow. The Integration Service runs a scheduled workflow as
configured.
By default, the workflow
runs on demand. We can change the schedule settings by editing the scheduler. If
we change schedule settings, the Integration Service reschedules the workflow
according to the new settings.
- A scheduler is a repository object that contains a set of schedule settings.
- Scheduler can be non-reusable or reusable.
- The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.
- If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows.
- If we delete a folder, the Integration Service removes workflows from the schedule.
- The Integration Service does not run the workflow if:
- The prior workflow run fails.
- We remove the workflow from the schedule
- The Integration Service is running in safe mode
Creating a Reusable
Scheduler
- For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.
- Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.
- When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.
Steps:
- Open the folder where we want to create the scheduler.
- In the Workflow Designer, click Workflows > Schedulers.
- Click Add to add a new scheduler.
- In the General tab, enter a name for the scheduler.
- Configure the scheduler settings in the Scheduler tab.
- Click Apply and OK.
Configuring Scheduler
Settings
Configure the Schedule tab
of the scheduler to set run options, schedule options, start options, and end
options for the schedule.
There are 3 run options:
- Run on Demand
- Run Continuously
- Run on Server initialization
1. Run on Demand:
Integration Service runs
the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs
the workflow as soon as the service initializes. The Integration Service then
starts the next run of the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs
the workflow as soon as the service is initialized. The Integration Service
then starts the next run of the workflow according to settings in Schedule
Options.
Schedule options for Run on
Server initialization:
- Run Once: To run the workflow just once.
- Run every: Run the workflow at regular intervals, as configured.
- Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.
Start options for Run on
Server initialization:
·
Start Date
·
Start Time
End options for Run on
Server initialization:
- End on: IS stops scheduling the workflow in the selected date.
- End After: IS stops scheduling the workflow after the set number of
- Workflow runs.
- Forever: IS schedules the workflow as long as the workflow does not fail.
Creating a Non-Reusable
Scheduler
- In the Workflow Designer, open the workflow.
- Click Workflows > Edit.
- In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow.
- Note: If we do not have a reusable scheduler in the folder, we must
- Create one before we choose Reusable.
- Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler
- If we select Reusable, choose a reusable scheduler from the Scheduler
- Browser dialog box.
- Click Ok.
Points to Ponder:
- To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow.
- To reschedule a
workflow on its original schedule, right-click the workflow in the
Navigator window and choose Schedule Workflow.
Pushdown
Optimization Overview
·
You
can push transformation logic to the source or target database using pushdown
optimization. When you run a session configured for pushdown optimization, the Integration Service translates the
transformation logic into SQL queries and sends the SQL queries to the database.
The source or target database executes the SQL queries to process the
transformations.
·
The
amount of transformation logic you can push to the database depends on the
database, transformation logic, and mapping and session configuration. The
Integration Service processes all transformation logic that it cannot push to a
database.
·
Use
the Pushdown Optimization Viewer to preview the SQL statements and mapping
logic that the Integration Service can push to the source or target database.
You can also use the Pushdown Optimization Viewer to view the messages related
to pushdown optimization.
·
The
following figure shows a mapping containing transformation logic that can be
pushed to the source database:
·
This
mapping contains a Filter transformation that filters out all items except
those with an ID greater than 1005. The Integration Service can push the
transformation logic to the database. It generates the following SQL statement
to process the transformation logic:
·
INSERT
INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID,
ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE
(ITEMS.ITEM_ID >1005)
·
The
Integration Service generates an INSERT SELECT statement to get the ID, NAME,
and DESCRIPTION columns from the source table. It filters the data using a
WHERE clause. The Integration Service does not extract data from the database
at this time.
Pushdown
Optimization Types
You can configure the
following types of pushdown optimization:
|
Source-side
pushdown optimization.
The Integration Service pushes as much transformation logic as possible to
the source database.
|
|
Target-side
pushdown optimization.
The Integration Service pushes as much transformation logic as possible to
the target database.
|
|
Full pushdown
optimization.
The Integration Service attempts to push all transformation logic to the
target database. If the Integration Service cannot push all transformation
logic to the database, it performs both source-side and target-side pushdown
optimization.
|
Running
Source-Side Pushdown Optimization Sessions
When you run a
session configured for source-side pushdown optimization, the Integration
Service analyzes the mapping from the source to the target or until it reaches
a downstream transformation it cannot push to the database.
The Integration
Service generates and executes a SELECT statement based on the transformation
logic for each transformation it can push to the database. Then, it reads the
results of this SQL query and processes the remaining transformations.
Running
Target-Side Pushdown Optimization Sessions
When you run a
session configured for target-side pushdown optimization, the Integration
Service analyzes the mapping from the target to the source or until it reaches
an upstream transformation it cannot push to the database. It generates an INSERT,
DELETE, or UPDATE statement based on the transformation logic for each
transformation it can push to the database. The Integration Service processes
the transformation logic up to the point that it can push the transformation
logic to the target database. Then, it executes the generated SQL.
Running
Full Pushdown Optimization Sessions
To use full pushdown
optimization, the source and target databases must be in the same relational
database management system. When you run a session configured for full pushdown
optimization, the Integration Service analyzes the mapping from the source to
the target or until it reaches a downstream transformation it cannot push to
the target database. It generates and executes SQL statements against the
source or target based on the transformation logic it can push to the database.
When you run a
session with large quantities of data and full pushdown optimization, the
database server must run a long transaction. Consider the following database
performance issues when you generate a long transaction:
|
A long transaction
uses more database resources.
|
|
A long transaction
locks the database for longer periods of time. This reduces database
concurrency and increases the likelihood of deadlock.
|
|
A long transaction increases
the likelihood of an unexpected event.
|
To minimize database
performance issues for long transactions, consider using source-side or
target-side pushdown optimization.
Integration
Service Behavior with Full Optimization
When you configure a
session for full optimization, the Integration Service analyzes the mapping
from the source to the target or until it reaches a downstream transformation
it cannot push to the target database. If the Integration Service cannot push all
transformation logic to the target database, it tries to push all
transformation logic to the source database. If it cannot push all
transformation logic to the source or target, the Integration Service pushes as
much transformation logic to the source database, processes intermediate
transformations that it cannot push to any database, and then pushes the
remaining transformation logic to the target database. The Integration Service
generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each
database to which it pushes transformation logic.
For example, a
mapping contains the following transformations:
The Rank
transformation cannot be pushed to the source or target database. If you
configure the session for full pushdown optimization, the Integration Service
pushes the Source Qualifier transformation and the Aggregator transformation to
the source, processes the Rank transformation, and pushes the Expression
transformation and target to the target database. The Integration Service does
not fail the session if it can push only part of the transformation logic to
the database.
Active
and Idle Databases
During pushdown
optimization, the Integration Service pushes the transformation logic to one
database, which is called the active database. A database that does not process
transformation logic is called an idle database. For example, a mapping
contains two sources that are joined by a Joiner transformation. If the session
is configured for source-side pushdown optimization, the Integration Service
pushes the Joiner transformation logic to the source in the detail pipeline,
which is the active database. The source in the master pipeline is the idle
database because it does not process transformation logic.
The Integration
Service uses the following criteria to determine which database is active or
idle:
|
1. When using full
pushdown optimization, the target database is active and the source database
is idle.
|
|
2. In sessions that
contain a Lookup transformation, the source or target database is active, and
the lookup database is idle.
|
|
3. In sessions that
contain a Joiner transformation, the source in the detail pipeline is active,
and the source in the master pipeline is idle.
|
|
4. In sessions that
contain a Union transformation, the source in the first input group is
active. The sources in other input groups are idle.
|
To push
transformation logic to an active database, the database user account of the
active database must be able to read from the idle databases.
Working
with Databases
You can configure
pushdown optimization for the following databases:
|
IBM DB2
|
|
Microsoft SQL
Server
|
|
Netezza
|
|
Oracle
|
|
Sybase ASE
|
|
Teradata
|
|
Databases that use
ODBC drivers
|
When you push
transformation logic to a database, the database may produce different output
than the Integration Service. In addition, the Integration Service can usually
push more transformation logic to a database if you use a native driver,
instead of an ODBC driver.
Comparing
the Output of the Integration Service and Databases
The Integration
Service and databases can produce different results when processing the same
transformation logic. The Integration Service sometimes converts data to a
different format when it reads data. The Integration Service and database may
also handle null values, case sensitivity, and sort order differently.
The database and
Integration Service produce different output when the following settings and
conversions are different:
|
Nulls treated as
the highest or lowest value.
The Integration Service and a database can treat null values differently. For
example, you want to push a Sorter transformation to an Oracle database. In
the session, you configure nulls as the lowest value in the sort order.
Oracle treats null values as the highest value in the sort order.
|
|
Sort order. The Integration Service and a
database can use different sort orders. For example, you want to push the
transformations in a session to a Microsoft SQL Server database, which is
configured to use a sort order that is not case sensitive. You configure the
session properties to use the binary sort order, which is case sensitive. The
results differ based on whether the Integration Service or Microsoft SQL
Server database process the transformation logic.
|
|
Case sensitivity. The Integration Service and a
database can treat case sensitivity differently. For example, the Integration
Service uses case sensitive queries and the database does not. A Filter
transformation uses the following filter condition: IIF(col_varchar2 = ‘CA’,
TRUE, FALSE). You need the database to return rows that match ‘CA.’ However,
if you push this transformation logic to a Microsoft SQL Server database that
is not case sensitive, it returns rows that match the values ‘Ca,’ ‘ca,’
‘cA,’ and ‘CA.’
|
|
Numeric values
converted to character values.
The Integration Service and a database can convert the same numeric value to
a character value in different formats. The database can convert numeric
values to an unacceptable character format. For example, a table contains the
number 1234567890. When the Integration Service converts the number to a
character value, it inserts the characters ‘1234567890.’ However, a database
might convert the number to ‘1.2E9.’ The two sets of characters represent the
same value. However, if you require the characters in the format
‘1234567890,’ you can disable pushdown optimization.
|
|
Precision. The Integration Service and a
database can have different precision for particular datatypes.
Transformation datatypes use a default numeric precision that can vary from
the native datatypes. For example, a transformation Decimal datatype has a
precision of 1-28. The corresponding Teradata Decimal datatype has a
precision of 1-18. The results can vary if the database uses a different
precision than the Integration Service.
|
Using
ODBC Drivers
When you use native
drivers for all databases, except Netezza, the Integration Service generates
SQL statements using native database SQL. When you use ODBC drivers, the
Integration Service usually cannot detect the database type. As a result, it
generates SQL statements using ANSI SQL. The Integration Service can generate
more functions when it generates SQL statements using the native language than
ANSI SQL.
Note:
Although the
Integration Service uses an ODBC driver for the Netezza database, the
Integration Service detects that the database is Netezza and generates native
database SQL when pushing the transformation logic to the Netezza database.
In some cases, ANSI
SQL is not compatible with the database syntax. The following sections describe
problems that you can encounter when you use ODBC drivers. When possible, use
native drivers to prevent these problems.
Working with Dates
The Integration
Service and database can process dates differently. When you configure the
session to push date conversion to the database, you can receive unexpected
results or the session can fail.
The database can
produce different output than the Integration Service when the following date
settings and conversions are different:
|
Date values
converted to character values.
The Integration Service converts the transformation Date/Time datatype to the
native datatype that supports subsecond precision in the database. The
session fails if you configure the datetime format in the session to a format
that the database does not support. For example, when the Integration Service
performs the ROUND function on a date, it stores the date value in a character
column, using the format MM/DD/YYYY HH:MI:SS.US. When the database performs
this function, it stores the date in the default date format for the
database. If the database is Oracle, it stores the date as the default
DD-MON-YY. If you require the date to be in the format MM/DD/YYYY
HH:MI:SS.US, you can disable pushdown optimization.
|
|
Date formats for
TO_CHAR and TO_DATE functions.
The Integration Service uses the date format in the TO_CHAR or TO_DATE
function when the Integration Service pushes the function to the database.
The database converts each date string to a datetime value supported by the
database.
|
For example, the
Integration Service pushes the following expression to the database:
TO_DATE( DATE_PROMISED,
'MM/DD/YY' )
The database
interprets the date string in the DATE_PROMISED port based on the specified
date format string MM/DD/YY. The database converts each date string, such as
01/22/98, to the supported date value, such as Jan 22 1998 00:00:00.
If the Integration
Service pushes a date format to an IBM DB2, a Microsoft SQL Server, or a Sybase
database that the database does not support, the Integration Service stops
pushdown optimization and processes the transformation.
The Integration Service converts all
dates before pushing transformations to an Oracle or Teradata database. If the
database does not support the date format after the date conversion, the
session fails.
|
HH24 date format. You cannot use the HH24 format in
the date format string for Teradata. When the Integration Service generates
SQL for a Teradata database, it uses the HH format string instead.
|
|
Blank spaces in
date format strings.
You cannot use blank spaces in the date format string in Teradata. When the
Integration Service generates SQL for a Teradata database, it substitutes the
space with ‘B.’
|
|
Handling subsecond
precision for a Lookup transformation. If you enable subsecond precision for a Lookup
transformation, the database and Integration Service perform the lookup
comparison using the subsecond precision, but return different results.
Unlike the Integration Service, the database does not truncate the lookup
results based on subsecond precision. For example, you configure the Lookup
transformation to show subsecond precision to the millisecond. If the lookup
result is 8:20:35.123456, a database returns 8:20:35.123456, but the
Integration Service returns 8:20:35.123.
|
|
SYSDATE built-in
variable.
When you use the SYSDATE built-in variable, the Integration Service returns
the current date and time for the node running the service process. However,
when you push the transformation logic to the database, the SYSDATE variable returns
the current date and time for the machine hosting the database. If the time
zone of the machine hosting the database is not the same as the time zone of
the machine running the Integration Service process, the results can vary.
|
Thanks mohan. the information shared was very helpful! Informatica Online Training
ReplyDeleteGreat Work Mohan . . Thanks a lot
ReplyDeleteIntelliMindz is the best IT Training in Bangalore with placement, offering 200 and more software courses with 100% Placement Assistance.
DeleteETL Testing Online Training
ETL Testing Course In Bangalore
ETL Testing Course In Chennai
ETL Testing Course In Coimbatore
Really thanks a lot dude..but i need few more informatica scenarios and unix interview question.could you pls post that also
ReplyDeleteFirst read this bro ! U got enuf info here !!
Deletevery useful information swathi
ReplyDeleteHi, Great Work, ...Good Stuff.. Keep Going...
ReplyDeleteDo You Know How To Integrate The Data By Using Informatica ETL Tool?, Learn at
ReplyDeletehttp://www.dwbiadda.com/course/informatica-online-training/
Best Informatica Training by TOP Industry Experts in Delhi/NCR with 100% Job Oriented & Placement Record.
ReplyDeleteFree 1st Demo Class, Fresh Batch Start from Nov-2014 Call to Book Your Seat Now : 08447037614/09910840204, info@ambrot.com
Hi, I am christina lives in Chennai. I am technology freak. I did Android mobile application development course in Chennai at reputed training institutes, this is very usful for me to make a bright carrer in IT industry. So If anyone want to get best Android Course in Chennai please visit fita academy which offers real time Android Training in Chennai at reasonable cost.
ReplyDeleteThis information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
ReplyDeleteGreen Technologies In Chennai
very Useful...........all under one roof
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis is very nice and inforamtive infromation.
ReplyDeleteinformatica training, informatica training in bangalore, informatica online training
Nice post about informatica questions. If you want to know more about informatica questions and answers for experienced, contact us through the links below
ReplyDeleteThere are many references or sources to learn about informatica interview questions and answers for experienced. But this is something that is quite perfect making sure all candidates get to know as well as learn every related areas and topics all at ease. The informatica interview questions are all prepared and planned by all experienced professionals associated to this profession.
Thanks for such informative blog, it really helped :)
ReplyDeleteThanks Mohan!!!...Please share the ETL testing Q&A as well.
ReplyDeleteIt was good collection. The Author did a great work. It’s really helpful for cracking Informatica interviews. I have gone through each and every site; it’s really helpful for me. Keep sharing. Suppose if you want more Interview Questions/A’s regarding Informatica I will share you link just have a look: https://goo.gl/TgZNht I hope it will help who are looking for Informatica Interview.
ReplyDeleteNice List of Citations..
ReplyDeletehttp://www.informaticaonlinetraining.co/
This comment has been removed by the author.
ReplyDeleteThanks for sharing its very informative and useful.
ReplyDeleteIt is very nice, helpful and thanks for the information. We also provide Informatica online training. check this site Tekslate for indepth Informatica
ReplyDeletetraining.
Hi.First of all I would like to say thanks for sharing this such nice post. I will be glad to read a such nice post.It's really so Informational and I learn much from them. Keep posting.if you interested,so join GangBoard and get more knowledge abou ETL Testing hereETL Testing
ReplyDeleteVery nice collection of questions thank you for sharing. Know more about Informatica Admin Training in Bangalore
ReplyDeleteThis idea is mind blowing. I think everyone should know such information like you have described on this post. Thank you for sharing this explanation.Your final conclusion was good. We are sowing seeds and need to be patiently wait till it blossoms.
ReplyDeleteData warehousing Training in Chennai
I read your blog. It’s very useful for me. I have some new idea to share with you .
ReplyDeletephp training company in Ahmedabad
php live project training in ahmedabad
live project training in Ahmedabad
Hello ,
ReplyDeleteAm Mrs Cynthia corvin . Am a lady with a great testimony I live in USA and i am a happy woman today? and i told my self that any lender that rescue my family from our poor situation, i will refer any person that is looking for loan to him, he gave happiness to me and my family, i was in need of a loan of $360,000.00 to start my life all over as i am a mother with 2 kids I met this honest and GOD fearing man loan lender that help me with a loan of $360,000.0.Dollar, he is a GOD fearing man, if you are in need of loan and you will pay back the loan please contact him tell him that is Mrs cynthia corvin, that refer you to him. contact Mr.Ferooz,via email:- feroozsuptoo@outlook.com
Nice Blog. Thank you for sharing nice information about Informatica. Nice Article
ReplyDeleteOnline Informatica Training
Online Informatica Training in Hyderabad
Great stuff provided by the Admin here… look into this for informatica online training
ReplyDeleteI really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Informatica , kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on Informatica. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
Very excellent post for more post please visit below for information
ReplyDeleteInformatica Online Training
This comment has been removed by the author.
ReplyDeleteNice Explanation on Informatica online training
ReplyDeleteAwesome,Thank you so much for sharing such an awesome blog...
ReplyDeleteWebsite Development Companies Bangalore
Best website designers in bangalore
Thanks for sharing valuable information with us, Keep share more content on MSBI Online Training Bangalorea
ReplyDeleteThis information you provided in the blog that is really unique I love it!! Thanks for sharing such a great blog. Keep posting..
ReplyDeleteInformatica training in Noida
Informatica training institute in Noida
Informatica course in Noida
Nice article! provided a helpful information.I hope that you will post more updates like this Informatica Online Course
ReplyDeleteYou have provided an nice article, keep updating Informatica Online Training Hyderabad
ReplyDeleteHi There,
ReplyDeleteI learnt so much in such little time about FINAL INTERVIEW QUESTIONS. Even a toddler could become smart reading of your amazing articles.
Early versions of Unix contained a development environment sufficient to recreate the entire system from source code. Is it possible to do so for the new released OS?
Awesome! Thanks for putting this all in one place. Very useful!
Obrigado,
Abhiram
This is a very good content I read this blog, please share more content on MSBI Online Course
ReplyDeletevery informative blog and useful article thank you for sharing with us ,keep posting Informatica Online Training Hyderabad
ReplyDeleteThanks for providing your information, get touch with for more details Informatica Online Training
ReplyDeleteThanks for sharing this infomation, It's very usefull to us....
ReplyDeletelaw college
ReplyDeletelaw college in Jaipur
Best law college in Jaipur
Law Course In Jaipur
Top College Of law In Jaipur
Vidyasthali Law College
Best Law College
Click Hear
Organic Cold Pressed Oils
ReplyDeleteOrganic Oil
Organic Oil in jaipur
Organic Cold Pressed Oil in Jaipur
Online Shop in Jaipur
natural oil
natural oil shop in jaipur
ayurved Oil shop in jaipur
ayurved oil
This information is impressive,I am inspired with your post writing style & how continuously you describe this topic.I feel happy about it and I love learning more about this topic.
ReplyDeleteLinux Admin Training
Linux Training in Chennai
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updatingmicrostrategy online training
ReplyDeleteIts a wonderful post and very helpful, thanks for all this information.
ReplyDeleteInformatica Training in Noida
Thank you for the nice article here. Really nice and keep update to explore more ideas.
ReplyDeleteBig Data Testing Services
Thank you for the nice article here. Really nice and keep update to explore more ideas.
ReplyDeleteBig Data Testing Services
Awesome post. You Post is very informative. Thanks for Sharing.
ReplyDeleteInformatica Training in Noida
Mastech InfoTrellis - Data and Analytics Consulting Company extending premier services in Master Data Management, Big Data and Data Integration.
ReplyDeleteVisit for More : https://mastechinfotrellis.com/data-management/
I think Informatica provides the best tool to visualize and integrate data in tabular form.Apart from this it is also a very crucial topic to be covered for interview purpose.Thank you so much for providing this post.
ReplyDeleteInformatica Read JSON
I am very happy to pass my CompTIA and thankful to Dumpspass4sure. I was worried for my exam before I was told about Pass4sure CompTIA Dumps. I secured scores more than my expectations because almost all the questions in the final test were familiar to me. I had practiced on testing engine which brought me near perfection. If I go for any certification in the future my definite selection will be Dumpspass4sure. Your work is really exceptional!
ReplyDeleteBest Perl Insterview questions and answers Perl
ReplyDeleteGood Post. I like your blog. Thanks for Sharing.
ReplyDeleteInformatica training course in Noida
This comment has been removed by the author.
ReplyDelete
ReplyDeleteNice information, this is will helpfull a lot, Thank for sharing, Keep do posting i like to follow this informatica online training
informatica online course
informatica training online
informatica course
informatica training
thank you for the great content provided oracle training in chennai
ReplyDeleteIt is really very helpful for us and I have gathered some important information from this blog. oracle training in chennai
ReplyDeleteVidyasthali Law College is a self-financing Institution affiliated to the University of Rajasthan to impart qualitative instructions for the degree of LL.B. (Three-year) course.
ReplyDeletelaw college jaipur
llb college in jaipur
best law college in jaipur
law college rajasthan
rajasthan law college
Vidyasthali Group of Institutions was founded by an eminent group of academics and industry leaders who are masters of the top significant achievements and accomplishments. Vidyasthali is a reputed B-school in Jaipur.
ReplyDeletevidyasthali institute of technology science&managment
best techonology science and management institute in jaipur
courses in vidyasthali group of institute
no 1 technology science & management college in jaipur
best management
Thanks for your interesting ideas.the information's in this blog is very much useful for me to improve my knowledge.
ReplyDeleteIELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training
Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
ReplyDelete| Certification | Cyber Security Online Training Course|
Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course|
CCNA Training Course in Chennai | Certification | CCNA Online Training Course|
RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai|
SEO Training in Chennai | Certification | SEO Online Training Course
This was an amazing blog. It had all the relevant information. Thankyou for sharing it. Here is a referred blog that I have found same as yours oracle fusion hcm training. Actually, I was looking for the same information on internet and i found your blog quite interesting and relevant.
ReplyDeleteA debt of gratitude is in order for your post. I've been contemplating composing an extremely tantamount post in the course of the last couple of weeks, I'll most likely keep it straightforward and connection to this rather if thats cool. Much obliged. best Digital marketing agency in Delhi NCR
ReplyDeleteCool stuff you have got and you keep update all of us.
ReplyDeleteCow Ghee
the content on your blog was really helpful and informative. Thakyou. # BOOST Your GOOGLE RANKING.It’s Your Time To Be On #1st Page
ReplyDeleteOur Motive is not just to create links but to get them indexed as will
Increase Domain Authority (DA).We’re on a mission to increase DA PA of your domain
High Quality Backlink Building Service
1000 Backlink at cheapest
50 High Quality Backlinks for just 50 INR
2000 Backlink at cheapest
5000 Backlink at cheapest
nice post.
ReplyDeletelinux training linux online training
mulesoft training etl testing online training
Thank you for the useful information. Share more updates.
ReplyDeleteLearn Cognos Online
cognos online training
thank you for your wonderful information on informatica keep sharing thank you
ReplyDeleteThe course content and structure of Salesfoce Billing Online Training. Contact us @ +91 9550102466.
informatica training online
Awesome article,content has very informative ideas, waiting for the next update…
ReplyDeleteWhat is struts framework
advantages of struts
Great blog. Thanks for sharing such a useful information. Share more.
ReplyDeletePytest Online Course
Pytest Online Training
Learn Hadoop Training in Chennai for excellent job opportunities from Infycle Technologies, the best Big Data training institute in Chennai. Infycle Technologies gives the most trustworthy Hadoop training in Chennai, with full hands-on practical training from professional trainers in the field. Along with that, the placement interviews will be arranged for the candidates, so that, they can meet the job interviews without missing them. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to know more Top Hadoop Training in Chennai | Infycle Technologies
ReplyDeleteinformative..
ReplyDeleteStudy visa consultants in ambala
An awesome blog for the freshers. Thanks for posting this information.
ReplyDeleteMulesoft Onine Training Hyderabad
Mulesoft Online Training in India
Thank you for sharing the information.
ReplyDeleteMPM Corner
Jumma Mubarak
tnmachiDa
teluguwap net
Coolmoviez
up Scholarship
Om Jai Jagdish
skymovies
Show consider end catch. Baby whom word her. Source community lay number.entertainment
ReplyDeleteI am regularly reading your content, and I am pleased that I came across such an informative post."Paperub connects you to hire skilled CompTIA experts ready to tackle your projects. Hire freelancers who specialize in CompTIA certifications, ensuring top-notch solutions for IT challenges. From A+ to Security+, find the right talent to drive success. Get started on Paperub today and bring your tech projects to life.
ReplyDelete