Effective use of a columnar database will have a significant impact on the business's top line and IT's bottom line.
If you aren't using columnar databases appropriately, you aren't getting the best price/performance out of your data -- not even close, in most cases. Columnar database technology lets you go deeper in big data analytics, speeds up enterprise reporting and cuts down on the number of administrators needed for storage.
A columnar database offers significant savings in storage for data warehousing if you use a vendor with good columnar-related compression. It also allows greater user support -- much better performance -- when handling big data, and can work in harmony with today's relational databases. My own total cost of ownership studies suggest that the price for adding columnar to relational -- additional administrative complexity, because you are dealing with two types of databases instead of one -- is typically more than compensated for by the fact that columnar databases are usually less complex to administer and have a higher degree of data compression.
Understanding the database constructs
Columnar databases store the data, as the name suggests, as columns (all data for a given field) rather than rows (all data for a given customer record). Columnar is almost exclusively used in business intelligence and analytics applications. Columnar architecture may have significant advantages in online transactional processing and mixed applications such as enterprise apps, if vendors ever focus on that part of the technology.
IT has employed columnar database architectures even longer than so-called row-oriented relational databases. Even before IBM introduced the first row-oriented relational database, Computer Corp. of America was selling the non-relational columnar Model 204 database. In the late 1990s, Sybase brought columnar to relational with its Sybase IQ product and bit-mapped indexing.
Row-oriented database technology excels in random reads, single inserts and updates, where a single record with more than two or three fields is accessed. Columnar excels for queries across a large number of records whose columns (or fields) contain some commonality that makes for effective compression. As of now, columnar databases do not involve much updating of individual records, although they handle mass uploads to data warehouses just fine.
It doesn't make sense to replace row-oriented relational databases entirely, nor should SAP on DB/2 or Peoplesoft on Oracle Database be targets for replacement. Columnar database technologies should increasingly take over data warehousing and analytics and complement existing operational databases. IT managers can buy specific vendor solutions on a project basis, such as to support a big data initiative, but also can slide more and more core data-warehouse work over to columnar databases.
It remains to be seen whether columnar can also serve as a partner to Hadoop, just as MySQL and other row-oriented relational technologies can. While the semi-structured and non-structured data typical of social media sources can benefit from some columnar compression, there are relatively few use cases.
Once your IT shop decides to adopt columnar database processing, you'll need to specify the right tool for the job. The four main vendors are Oracle, HP, SAP and IBM, and each has its own approach to the columnar database market.
About the author:
Wayne Kernochan is president of Infostructure Associates, an affiliate of Valley View Ventures. This document is the result of Infostructure Associates-sponsored research. Infostructure Associates believes that its findings are objective and represent the best analysis available at the time of publication.