How to run multiple Transformations from one Job in Pentaho


In this article I am going to talk about how to run multiple transformations from one job using Pentaho. I hope you'll know about pentaho, transformations and jobs. This article include two ways of doing the same process. one method is simple, easy but not much efficient way. However, the second method is more efficient and success way.

First,let's talk about the easy method,

This is the the most common method that you can find from the internet. And also, this is very easy method that fulfill the requirement of running multiple transformations from one job.  Let me share an example of this approach and discuss how we are going create this job and the drawbacks of this method.

Fig 01: Easy method for run multiple transformations at once

As you can see in above image, There are 4 transformations connected with the start and when run the  
above job all of the execute and run sequentially. If you need to run all of the together in parallel, you can change the settings by Right click one transformation --> select Run Next Entries in Parallel option.

This method can be apply if there are 2 or 3 (less amount) transformations. What will do if there are 100 of transformations? Its really a hassle to create 100 of transformations right? That is the main drawback of this method. We need a convenient way to run 100 of transformations in one job.

Second method will give answers for that questions.

Lets talk about the plan. So our plan is execute multiple transformations by running on job. 

In order to achieve this you may need,

  1. A method to read your ktr names for given location
  2. A method to pass all names as a parameter
  3. Finally run all ktr files from one job
I have found a method which will cover all above points. It contains main job(kjb) and inside that job, there is a one transformation which call the ktr files that get the all ktr names and there is another kjb which execute all ktr files. Therefore, according this description, there are two ktr files and two kjb files.  

The main job(kjb) can be create as follows.

                                                                           Fig 02: Main Job

To get all ktr names, use Get_Files_Get_all_transformations.ktr. Therefore, above image (Fig 2) inside Transformation, you need to set the path of Get_Files_Get_all_transformations.ktr

01) Get_Files_Get_all_transformations.ktr can be create as follows.

                                                        Fig 03: Get_Files_Get_all_transformations.ktr

In this file (Get_Files_Get_all_transformations.ktr), it will get all ktr files that given path. The path you can set as follows. you need to set the path inside Get file name step. Remember to set wildcard according to your file type correctly. in this case it is . ktr.


            Fig 04: set wild card in Get file names section

Upto to this step we developed all necessary developments that required to  execute Get_Files_Get_all_transformations.ktr which inside the Main job. (Fig 01).  Now let's move in to the 2nd step of the Main job (Fig 01).

02) Launch transformations

This is the inner job and it will execute all ktrs that we pass. In order to do that, we need to create new job and set the path that job inside the Launch transformation in Fig 01. Launch_transformations.kjb can be create as follow in Fig 05.  This job uses a parameter called FILENAME based on the results from previous transformation. It uses it to execute a transformation with filename ${FILENAME}.

                                                                    Fig 05: Launch_Transformations.kjb

In order to create the parameter $(FILENAME) parameter with name and the path, you need another transformation. Set_Variable.ktr will do that process and you need to set the path of Set_Variable.ktr inside Launch_transformations.kjb (Fig 05). 

                                                                                      Fig 06: Set_Variable.ktr 

As the final step of Launch_transformations.kjb, you need to execute all results that passed on FILENAME parameter. It can done by as below Fig 07.

 

                                                                Fig 07: Execute ${FILENAME}

Hope this will help you all.  If you need sample files that I have created, you can get it through GitHub from this link.

Comments

Popular posts from this blog

Data Analytics

Distributed Systems