Joining Hadoop and Relational Data using Cirro

Organizations that embark on big-data projects can now leverage enterprise-level implementations of Hadoop from vendors like Cloudera, Hortonworks, and MapR. These vendors offer robust MapReduce processing of large distributed data sets. Moreover, Amazon Elastic MapReduce eliminates the need for large upfront hardware investment, reducing the risk of such projects.

While this is good and well, some challenges emerge:

  1. How to join together data from MapReduce and traditional relational data sources for analysis.
  2. How to do this efficiently enough to allow for ad-hoc query.

To answer both questions, consider the combination of Explore Analytics and Cirro.

Cirro provides customers with enterprise-class join capabilities for big data and traditional data sources. With Cirro it is easy to access, join, share, interact and iteratively analyze any data using SQL. You can think of Cirro as a data hub. It receives SQL queries from BI tools, decomposes these queries to run against multiple data sources and assemble the results in real time. A great advantage of Cirro is its ability to federate processing of data conditions and aggregation to each data source and thereby reduce the amount of data that needs to be moved and assembled together.

Explore Analytics is a sophisticated tool for data analysis and visualization.  It’s a SaaS solution that lets users access their data from anywhere and using any device including desktops and tablets. Non-technical users can build sophisticated queries that join, filter, and aggregate data. They can slice-and-dice data and create interactive visualizations.

Runaway queries that pull large amounts of data have always been a concern when implementing ad-hoc query capabilities. Such runaway queries are not possible with Explore Analytics. The key to this feature is the ability to push filtering, joins, and aggregation to the data source and always perform limited queries. This is where the integration of Explore Analytics with Cirro shines. Using Cirro, Explore Analytics can join data from heterogeneous data sources that include Hadoop and relational.

Explore Analytics hands Cirro a detailed query that is then decomposed to the different data sources and results are assembled together to deliver the exact results that the user desired.

Conclusion

Together, Explore Analytics and Cirro serve to enable non-technical users to access big data and combine it with traditional data sources. Users can focus on their real goal of obtaining actionable information, understanding business drivers, and predicting future trends. Isn’t that what Business Intelligence is all about?

This entry was posted in How-To by Gadi Yedwab. Bookmark the permalink.

About Gadi Yedwab

Gadi Yedwab is the founder and CEO of Explore Analytics. Prior to founding Explore Analytics, Gadi served as VP of Product Development at ServiceNow, a leading provider of cloud-based services that automate enterprise IT operations. Prior to ServiceNow, Gadi Yedwab held executive positions at Quest Software and Brio Technology (which was acquired by Hyperion and then by Oracle). You can reach Gadi on twitter at @GYedwab or using the Feedback Form.