Aggregator Transformation

Aggregator Transformation

The Aggregator transformation performs aggregate calculations, such as averages and sums. The Integration Service performs aggregate calculations as it reads and stores data group and row data in an aggregate cache. The Aggregator transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform calculations on groups. 
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. 
Transformation type:
Active
Connected 

Port:
Input,output,variable,group by.





Aggregator is an active transformation that changes the number of rows in the pipeline. The Aggregator transformation has the following components and options:
Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. The Integration Service stores group values in an index cache and it stores row data in the data cache. 
Aggregate expression. Enter an expression in an output port. The expression can include nonaggregate expressions and conditional clauses.  
Group by port. Indicate how to create groups. You can configure an input, input/output, output, or variable port for the group. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order. 


Aggregator Transformation Properties

Aggregator Setting
                   Description

Cache Directory : Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir.If you have enabled incremental aggregation, the Integration Service creates a backup of the files each time you run the session.
Tracing Level : Amount of detail displayed in the session log for this transformation.
Sorted Input : Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.
Aggregator Data Cache SizeData cache size for the transformation.Default cache size is 2,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service.
Aggregator Index Cache Size : Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service.
Transformation Scope : Specifies how the Integration Service applies the transformation logic to incoming data:
Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.
All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.

Aggregate Functions

AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE 
Nested Aggregate Functions
When you include single-level and nested functions in the same Aggregator transformation, the Designer marks the mapping or mapplet invalid.
NOTE : 
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.


Comments