Imagine a new developer joining your project, having to do rework on a job created by another team member or that one job that was created a while ago. A scenario not hard to imagine for any ETL developer. Now, this job has been feeding a certain target system for a long time already, but we want to include a new transformation rule or hook it up to a new or updated source system.

Once effort is put into creating a solid job, what we do not want is a developer having to go through all the steps again for incorporating a minor change. But without proper description of the expected behavior, it is hard to verify the desired impact of the change. The developer would have to dig deep into the job to understand it’s functionality and the results produced. Then – to test the job he would manually evaluate the source data and the target output, looking at both values and data types, according to the transformation rules. This becomes a redundant & time-consuming task– no matter how experienced the developer is. Moreover, these manual checks will probably be postponed until the end. And when the tests are manual or inspected visually, chances of human errors on creating tests & validating outcome increases.

Test-Driven-Development

Let’s turn this around and start Test-Driven-Development(TDD). To do so we need a meaningful regression test that can easily be understood and adapted to validate the correct functioning of our system under different scenarios. Following this approach the developer would look at the job code and the test script, describing it’s expected behavior, side to side. If the change in the job requires a modification of the script, he changes the test script to match to the job again. The script is used anyway to understand what the job is expected to do and to validate the job functionality after the change.

We use Robot framework, to practice TDD in Talend. A flexible keyword-driven approach to describe and implement scripts for test automation. A Robot Framework test suite is made up of one or more test scripts containing keywords and their corresponding arguments. The actual tests are kept readable as their complexity is or will be kept within the used test libraries or resource files. This makes the actual scripts very transferable from one developer to another and serves as documentation of the expected behavior. An initial effort needs to be made to set up a generic test suite. But once that is done, the building blocks, keywords and even script templates are fully reusable to define new tests.

Using Robot Framework for a generic & automated Test-Platform

Robot Framework allows us to define tests that check the functioning of the various components in our ETL-system, so not just job-internally. As an example, let’s consider a keyword ‘Execute Talend Job Successfully’ with the job name as an argument. This keyword is composed of other existing keywords defined in referenced test libraries. These keywords orchestrate that the job on some external storage gets accessed, processed and checked upon successful completion based on its return code. Another set of keywords then verifies whether the correct entries were created in our job management/audit framework, validating job behavior in case of happy or sad flows.

Our test suites, test cases and test data are built as a robot-maven test project which can be managed via Eclipse, nothing uncomfortable here for a Talend developer. The tests can then be triggered either via the interface or via a Maven command, which forms the crucial step to integrate with and obtain real value from your Continuous Integration or Continuous Deployment pipeline.

Increase Productivity & Test Reliability

Utilizing these automated regression tests, in addition to other testing functionality available in Talend, increases a developer’s productivity and confidence in the data. Once a test setup has been defined it relieves the developer of manually carrying through the basic and repetitive – but essential – testing tasks. And, more importantly, it provides an instant and detailed feedback report, allowing the developer to quickly fix the defect when the rework is still fresh in mind. Apart from triggering these tests manually, they can be triggered on commit by a CI-server like Jenkins as well, making sure that only jobs that have passed the tests are deployed.

In one of our next newsletters we will focus on the various components of a CI/CD pipeline for data integration and see how test automation is incorporated. Interested in more information? Please contact us or have a look at our upcoming events to meet us in person!