The Passionned Group ETL Tools & Data Integration Survey has existed since 2003, and is a 100% supplier-independent market comparison and analysis report. Hundreds of organizations use the report worldwide to make the best choice for an ETL tool or data integration solution quickly. The December 2014 edition was recently published. “In fact, it’s not merely an update, all the parts have been completely revised”, said Passionned Group Editor, Rick van der Linden.
We had talks with almost all the major ETL suppliers
Normally, the survey is updated incrementally every time. But now we have talked to virtually all of the major suppliers of ETL tools in a very short time. This is almost unique, according to Van der Linden, who also noticed that nowadays, major suppliers are perfectly capable of articulating their vision of ETL and data integration, and have translated this into concrete solutions in many cases.
From technical tool to a total vision of data integration
“Not so long ago, many vendors thought of ETL utilities as technical tools for linking data and databases. This was also evident from often non-user friendly software that was created for specialists. Now, they have a complete vision on data integration and enterprise data management.” Van der Linden looks back, although it has always been an issue, suppliers now realize they have a big role to play in facilitating data quality management and improvement.
“It’s important that we are supported by the vendors” he said.
“Specialist companies like Informatica and SAS have been working on this much longer, but now you also hear vendors like IBM and SAP talking quite enthusiastically about it. I think it is also important on the BI side that we are supported by vendors in the area of data management. As they are able to put forward good solutions for data quality and get it on the agenda this way. I think it’s going to help us as BI professionals.”
He also sees a gradual consolidation of the various products a supplier has in their portfolio. For example, IBM previously had three ETL products, including Cognos Data Manager, which are now being merged into one IBM ETL platform. The same applies to Oracle, which reduced their ETL proposition of two products to one. SAP is now taking the former Business Objects ETL tool to market without the additional BO signature, and is integrating former Sybase products into it too. It’s now called ‘SAP Data Services’.
Relational no longer rules
Previously, the database world was a lot clearer: data was stored in relational databases. That is changing, observed Van der Linden. “Within the context of Big Data, Hadoop is particularly popular with, for example, a Hive data warehouse infrastructure, for storing data from new data sources. More and more people are wondering how to manage this. Everyone wants to be able to connect to new sources. ETL in itself is the right methodology for this, but the difficulty of Big Data is that it’s too big to move and load.”
What do we do with the L from ETL when we’re talking about Big Data?
“SAS and Pentaho have a clear vision about this: they simply push the execution of queries into Hadoop. They ensure that there is code in Hadoop that fires the query for us; they send instructions to Hadoop, which does the calculations and returns a dataset. You can choose to save the resulting dataset, but that isn’t necessary. With volatile data, I can imagine you’d want to save something for historical purposes. But the principle of Big Data is, of course, that it’s too big. And vendors are all thinking about a response to that.”