Why Facebook and the NSA love graph databases

Graph databases play six degrees of separation to find real connections. See how IT teams can use the database approach for businesses.

This article can also be found in the Premium Editorial Download: Modern Infrastructure: Application performance management sets new goals:

Is there a benefit to understanding how your users, suppliers or employees relate to and influence one another? It's hard to imagine that there is a business that couldn't benefit from more detailed insight and analysis, let alone prediction, of its significant relationships.

If you have ever drawn dots on a whiteboard and then connected them, you can appreciate that thinking in terms of nodes and links naturally echoes many real-world scenarios. Many of today's hottest data analysis opportunities for optimization or identifying fraud are represented as a linked web.

Analyzing sets of nodes and the relationships between them is known as graph theory.

In a graph database, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away.

Specialized graph databases are a small but fast-growing part of the so-called NoSQL (not only structured query language) database movement. Graph databases are designed to help model and explore a web or graph of relationships in a natural and more productive way than through the traditional relational database approach.

In a graph database, for example, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away. If the same problem was force-fit into normalized tables in an SQL relational model, the translated query would become quite complex and require tens or even hundreds of nested full table joins.

In a relational database query, every required join is going to cause a performance hit. For graph problems of any size, a SQL approach will be demonstrably slower, more complex, prone to error and definitely not as scalable.

The graph database isn't square

Graph databases don't require a predefined schema; nodes and links can have attributes edited or assigned to them at any time. If a new relationship type is discovered, it can be added to the database dynamically, extending what's modeled in the database.

In production, IT should be aware of differences in how graph databases scale, how they use memory and how they ingest (and index) data loads.

At one level, graph databases have no special infrastructure requirements. But in production, IT should be aware of differences in how graph databases scale, how they use memory and how they ingest (and still index) data loads.

If critical data is targeted for graph database hosting, early adopters could experience pains from data protection. Backup, restore, replication and other data management capabilities (e.g., security and access/audit) aren't nearly as mature as they are in the SQL world.

Graph your work

The most popular developer-friendly open source graph database today is likely Neo4J. Neo4j can run local (embedded in the app) as a Java library, but it can also be set up as a standalone server.

Some graph databases serve different kinds of data structures, for example AllegroGraph for RDF/XML, or Hadoop-related efforts like Titan and Giraph for big data challenges. At the commercial end of the spectrum, Oracle, Sparsity Technologies (DEX) and Objectivity (InfiniteGraph) offer scalable products.

Business people and application developers know that many classes of problems are best tackled with graph-based approaches, in which queries about traversing relationships are more central to the application than selecting sets of nodes by their inherent attributes or static hierarchies. Graph databases become an increasingly popular service for IT to implement and deliver, and thus become an important new workload in the data center.

I haven't mentioned likely super-scale graph database users such as the NSA, CIA, FBI or any other information-powered agency, because, well, I couldn't tell you even if I knew. But understanding the web of relationships among events, people, transactions, locations and sensor readings might lead to superior intelligence insight. Your business no doubt has similar opportunities.

About the author
Mike Matchett is a senior analyst and consultant at Taneja Group.

This was first published in June 2014

Dig deeper on Storage concerns in the data center

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchWindowsServer

SearchEnterpriseLinux

SearchServerVirtualization

SearchCloudComputing

Close