Significantly reduce the time spent acquiring data for analytics projects
Analytics has undergone somewhat of a revolution over the past number of years. Companies like Google, Facebook and Amazon have shown the power of analytics by simply using the data they collect as part of their everyday existence to improve how they do business and the customer experience they offer.
Advances in technology have also helped with cost effective ways to process massive amounts of data or ‘Big Data’ for analytics purposes, while there are now a myriad of ways to illustrate what the data means leading to well informed decisions. These advances have primarily been based around the new data being generated by the Internet and Social Media platforms. However, organizations built up before the Internet such as banks, insurance companies and government have been collecting data for the last 50 years of operation. Imagine what can be learned from this data?
Many are now suggesting that “data is the new oil”; if this is the case, the data these organizations have been collecting for 50 years is the premium grade oil. It represents hard facts as it generally comes from the systems of record upon which these organizations are run. While some analytics has been done in the past, it generally consisted of overnight batch reports that were delivered to someone’s desk for review. The world has moved on and this data must be become an integral part of the new analytics landscape there today.
The following is the definition of what Analytics is from Wikipedia:
“Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.”
With technologies available today, there are literally hundreds of open source and proprietary technologies available to do the majority of the above and present it in multiple different ways. As with all technologies, some are more suitable than others for different purposes. Data visualization is often a personal preference and thus having an ability to use multiple technologies with the same data can ensure that individuals can use their own technological and visual preferences to view the data.
The problem here is that that first part of this definition is difficult. There is general consensus that a significant majority of time and cost of analytics projects is used up in the discovery and access to the data in the first place. This can represent up to 80% of a typical project’s time and budget leaving 20% to achieve the real business objective of the project; the resultant analytics.
It is possible to reduce this 80% significantly thus enabling analytics projects to be delivered far more quickly or to allow more time to do a more thorough job of the analytics of the data for the business.
The Google’s and Facebook’s of this world were ‘born in the cloud’ and have made the concept of big data analytics look easy because, all of the data being collected is held in an open, standard fashion that facilitates acquiring the data using standard techniques from different sources and being able to process that data easily as it is, by definition, already normalized.
Take the simplest case of an organization with a reasonably uniform software environment of using only Microsoft Windows, even here it is usually the case that there are multiple databases in use. While there are ways to get at these databases they are different and potentially require separate solutions on the client side for each client to be used. In addition, for security reasons, few organizations are willing to open ODBC/JDBC connections to their data in this way:
In our traditional organizations, the data is likely to be on a multitude of different platforms, in different code pages and in different databases. Hence getting access to required analytics data is ever more varied in this being dependent on the platform, code page and technologies in use:
More often than not there is generally a requirement to pull information from multiple different locations across firewalls and in these cases no organization will allow open JDBC/ODBC connections over a firewall.
So while it is possible to get at this data, it takes time (up to 80% of a project) for the following reasons:
The Internet has changed the world since its introduction and the body that has overseen this change, the World-Wide Web Consortium (W3C) and others have delivered further related standards that are key:
Using these new standards offers the ability to access data more quickly and cleanly than ever before.
The key to unlocking this data for analytics processing is to make it available in a similar way to that used by the Google’s of this world using standard data representations and standard access mechanisms. In general, this means normalizing the data into an XML or JSON representation and making it available using REST or SOAP messaging protocols.
Normalization in this sense will mean the creation of XML feeds from the various data sources that are directly consumable by open technologies regardless of the host data code pages (e.g. ASCII or EBCDIC), platform (e.g. Windows, z/OS or Solaris) or database (e.g. MS SQL Server, DB2, VSAM or Oracle).
The representations are not restricted to XML and JSON (which are used for illustrative purposes) but can also be delivered as CSV or RDF files and other formats as may be required. Using these representations the feeds are usable by all the common analytic technologies:
While this model could potentially be built by hand, Ostia’s Portus platform provides a simple configuration based approach to enabling these feeds and making them available to all of those technologies:
This solution offers the following benefits:
Ostia have many years of experience with core IT systems and understand their strengths and weaknesses. We have also had much experience with traditional integration stacks and the models that have built up around them but now offer an ability to provide adaptive integration solutions in the timeframe required for agile projects. Ostia are able to do this because of their Portus data integration technology they have developed over the last 10 years. It has taken 10 years for Ostia to get to the front of today’s technology. We can help you stay there.