IBM Information Server 8.X (DataStage): Parallel Transformer Stage Properties

DataStage: What is Transformer Stage?

DataStage provides several stages for loading the data into the data warehouse or data marts. The stages classified in General, Database, Development and debugging, File, Processing, Real time, etc. and the transformer stage is a processing stage. We will explore different options available such as execution mode, preserve partitions.

Advanced tab of stage properties:

On the Advanced tab, the following options are available to configure.

Execution mode. The stage can be run in parallel or sequential mode. In parallel mode, the data is processed by the available nodes as specified in the Configuration file, and by any node restrictions specified in the Advanced tab. In sequential mode, the data is processed by the conductor node.
Combinability mode. It is Automatic by default, allowing WebSphere DataStage to combine the operators behind the parallel stages to run in the same process if it is sensitive for this type of stage.
Keep partition. This is set up to Spread By default, this sets or clears the partition according to what was set in the previous step. You can also select Place gold Of course. If you select Place, the scenario will request that next stage preserves partition How is it.
Node group and resource constraints. Select this option to restrict parallel execution to the group of nodes or groups or resource group or groups specified in the grid. The grid allows you to choose options from drop-down lists that appear in the configuration file.
Constrained node map. Select this option to restrict parallel execution to nodes in a defined node map. You can define a node map by typing the node numbers in the text box or by clicking the Browse button to open the Available Nodes dialog and select nodes from there. You are effectively defining a new node group for this stage (in addition to any node group defined in the configuration file).

Properties Substitute key tab:

Select the font type field as Flat file gold DBSequence

Transformer Stage – Home Page

Partitioning tab:

The Partitioning tab allows you to specify details about how the incoming data was partitioned or collected when it entered the Transformer stage. It also allows you to specify that the data should be sorted based on input.

By default, the Transformer stage will try to partition the incoming data or use its own partition method as dictated by the previous stage of the job.

The Partitioning tab also allows you to specify that the data arriving at the input link must be sorted. Sorting always takes place within data partitions.. If the stage is partitioning incoming data, the classification occurs after partitioning. If the stage is data collection, the classification occurs before collection. Sort availability depends on the partition method chosen.

Perform classification. Select this option to specify that the data entering the link should be sorted. Select the column or columns to sort from Available ready.
Stable. Select this option if you want to keep previously sorted data sets. This is the default.
Unique. Select this option to specify that if multiple records have identical sort key values, only one record is retained. If a stable sort is also established, the first record is preserved.

Preserves the sort order:

Select this option if you know that the rows that are entered in the Transformer stage have been sorted and you want to preserve the sort order.

The transformer stage is a very important stage and we use it very often in DataStage job design. The various options available in the stage along with the options available to define local variables, call routines, and transformations and derivations make this stage unique among the stages available in the DataStage.

About Me

Dashy

IBM Information Server 8.X (DataStage): Parallel Transformer Stage Properties

Leave a Reply Cancel reply