Modern Infrastructure

The problem with private cloud


Big data privacy concerns spur research, innovation

Big data privacy concerns such as what type of data is collected and how it's used have sent researchers looking for a computational solution.

Big data privacy concerns are becoming common. Many organizations already have the data they need -- but they’re afraid to use it.

Big data isn’t necessarily new data. Rather, it’s data that has been accumulating, often gathering dust, over the course of many years. Now, new tools and processes are turning that data into live, actionable and sometimes very revealing information.

A group of MIT professors, all panelists at the MIT Sloan CIO Symposium in May, offered some enlightening -- and chilling -- potential uses for gathered and analyzed data. For researchers, big data tools offer access to a treasure trove of behavioral information, statistics and other hard numbers that are waiting to be discovered.

For instance, at the moment, “new treatments for advance cancers are based on the intuition of doctors,” said Dimitris Bertsimas, professor of operations research. “Drug combinations are used over and over again. But can there be new combinations?” That information was easy to find. An existing database of treatments and drugs dates back nearly 40 years.

The Operations Research Center at MIT, which Bertsimas co-directs, created a database with natural language processes, pulling information from academic oncology papers. The outcome was a graph charting survival against toxicity for treatments that had been used through the years. That allows doctors to personalize treatment, Bertsimas said, eliminating guesswork in favor of hard numbers.

Masses of mobile phone data already exist, said panelist Alex “Sandy” Pentland, director of MIT’s Human Dynamics Lab and Media Lab Entrepreneurship Program. “Companies gather it but are scared to use it,” he said. “It’s politically controversial. But we have to start to see that cell phone and public data is for the public good.”

European telecom company Orange started a data commons -- company data put into the public domain to encourage business around its use. In one case, commute times were reduced by suggesting rearranged bus routes based on mobile phone location data, Pentland said.

Other cell phone data projects helped reduce infectious diseases and create a real-time census map of the Ivory Coast -- a country where citizens hadn’t been counted in years. That data also let researchers see ethnic boundaries, which had long been unclear.

The mobile data gathered isn’t necessarily personal smartphone data, but data on when phone owners moved from one cell tower to another. “There are statistical relationships between the pattern of calls and the pattern of mobility,” Pentland said. “It’s the heart of what that data is about.”

Walking a fine line on data privacy

Who can argue with curing cancer and encouraging world peace? But analyzing big data can quickly turn into snooping, researchers said. Predicting crimes before they might happen is one example. Police could deploy extra officers based on common crime location information that’s been cross-referenced against people of interest sending text messages to one another, Pentland said. The actual texts themselves wouldn’t be revealed -- simply the fact that the messages are being sent at all.

Ensuring privacy is possible when individual information is stripped from its associated data, said Andrew Lo, professor of finance at MIT. “To prevent big data from becoming Big Brother,” he said, “we have to protect the privacy of individuals, and we have the tools to do that.” Cryptographic methods could allow researchers to get aggregate statistics from encrypted personal information stored on a central server while still ensuring privacy, he said.

The sky’s the limit for actionable intelligence based on big data in fields such as health care, consumer behavior and organizational management, according to the MIT researchers.

“You can track patterns of conversations [among employees] with name badges and automate an office layout and productivity, automate interactions to produce greater output,” Pentland said.

Till then, it’s cultural, not computational, concerns that reign. “I’m very optimistic that a lot of the things around privacy and data ownership will be taken care of,” Pentland said. He envisioned an opt-in framework, where subjects have the ability to audit what information they share.

“The real challenge here is making the data available,” Pentland said. “People have to be willing to share their data and make it available.”

Let us know what you think. Write to us at

Article 9 of 14

Dig Deeper on Data center budget and culture

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Modern Infrastructure

Access to all of our back issues View All