What does “big data” mean to you and your company? At its essence, big data is a large amount of unstructured data that is mined to find patterns to aid your business. But when we asked our Advisory Board members
In my mind, big data refers to the mountains of unstructured data every company collects. The data may be in the form of systems logs, e-mails and instant messages. In the age of social media, companies might spend a lot of time sifting through Twitter, Facebook, blogs and message boards.
Of course, it takes a lot of disk to hold big data, which implies corporations will have to weigh the advantages of data mining against the cost of buying and maintaining storage hardware and the software it takes to manage the extra data.
That said, I think big data is here to stay. As companies strive for efficiency and grasp for every advantage, business intelligence (BI) may be the tool that makes the difference. This will be legitimized as search engines get better and people learn how to exploit them. It's fantastic — and more than a little unsettling — to see how quickly some of these engines cut through gigs of data and render specifics about any person, place or thing.
Another aspect of the big data issue that doesn't get a lot of press is systems management. Someone with the experience and the right tools can search through system logs and messages to spot things that need to be fixed.
I expect we will get deeper into big data as we discover things about our systems and customers. Details like life cycle management, governance and, most of all, privacy will follow long after the genie is out of the bottle.
From my perspective, big data is the idea that you have so much data in your environment that database tools have trouble reconciling or managing it. I have several friends in the bio-medical industry that do have real big data problems, but my company doesn’t yet.
If big data shows up on the horizon, we’ll plan properly and hopefully avoid problems in our environment. We’ll have to have very strict controls of data, its use and its storage. I can see how it could become a huge problem very quickly. For example, if your company participates in some kind of acquisition, you might have significant data management issues, as both companies would be unprepared to deal with the data glut.
Data protection would come up as an issue as well. Regulations and laws require accountability of all data. We deal with credit data, for example. It would be totally unacceptable for us to obtain data of any kind without provisions to protect the data, which is impossible unless the data is understood and managed.
My team has to be ready to adapt to the needs of big data. It would be unacceptable to obtain any additional data set without a mechanism for managing and securing the data. We don’t (and can’t) grow data without the infrastructure to manage and utilize it. This might not be the case universally, but it certainly is in our case.
Many of our customers have griped over the concept of big data. For the most part, they come to the understanding that big data means large — and sometimes very awkward — data sets and databases.
As organizations grow, so do their database requirements. It all comes down to large data transactions which can become difficult to control or manage. Large companies with massive databases may deal with 1 million customer transactions per hour and have to import 2 petabytes of data or more. Others like Facebook manage billions of user photos and must control that data. Basically, big data means massive datasets in need of management and direction. Until there is a better way to segment and manage these large databases and data sets, big data is going to stick around.
Companies like IBM, Oracle and SAP have already spent billions of dollars creating management tools for these new challenges. As customer-centric organizations continue to grow and emerge, the amount of data they have to manage will grow proportionately. Data will continue to grow and evolve, requiring advanced management and analytics tools to oversee larger and larger databases. I don’t believe this is a “flash in the pan” trend. The increased reliance on the Internet and cloud computing equates to even larger amounts of data being transferred between organizations and individuals. This can only mean a rise in the amount of data.
Luckily, there are already some great tools to manage and grow sets of big data. However, with large environments come big investments in data management tools. Customized implementations of a data management and analytics tool-set can range from a few thousand to a few million dollars.
Then there is the consideration of management and future growth. The best way for a team to handle big data requirements is to address concerns early and control data before it becomes too large. Implement data control best practices and monitor its flow within the environment, because the more control you have over the data, the better you can manage and plan. Too often, data sets grow faster than an organization anticipates and leaves IT managers to play an expensive game of catch-up. Apply best practices from the start of any database or data-set deployment initiative so IT environments can plan their growth strategies around their growing data demands.
For CIOs, big data mostly means the added ability to apply analytics to Web — more unstructured — data in very large quantities, particularly social-media data from Facebook. Secondarily, some organizations are accessing new “Web sensor” data, depending on their industry, such as GPS or cell-phone transmissions. Eventually, this will mean combining this data in near-real-time when possible — this data is dirtier and often significantly delayed — with data warehouses/marts.
This is being driven not just by IT, but by corporate-side “business analysts” who are moving very fast to generate in-house “applets” for incremental insights. The business analysts are very impatient with IT slowness, as they perceive it, in processing big data. The CIO has to respond to the demands of the analysts or risk losing clout.
We have to take measures to deal with big data adequately. That means access mechanisms on-site to Hadoop/MapReduce data interfaces on multiple public clouds, plus a way to combine those streams with each other, plus ways of filtering a fire hose of data before it reaches the organization's data stores, plus ways of syncing the dirty/delayed big data with near-real-time existing enterprise business intelligence (analytics and enterprise reporting). For medium-scale enterprises, much of this will take place within the public cloud. For large enterprises, vendors will supply attachments and the data; the new software will mostly reside inside the corporate firewall, in the data center(s).
This was first published in March 2012