Q
Get started Bring yourself up to speed with our introductory content.

What should an IT systems analyst learn about big data?

I am an IT systems analyst working on a big data project. What should I know before I dive into the project?

The best way to learn big data principles is to start with how information systems work, particularly databases...

and infrastructure. It's also important to cover big data tools like Cloudera, Hadoop, Spark, Hive, Pig, Flume, Sqoop and Mesos before you start.

A systems analyst should understand how to organize, manage and protect data. There are dozens upon dozens of data management products available on the market to help organize and manage data. Your big data database may comprise structured and unstructured data from various sources -- data warehouses, Hadoop, NoSQL, in-memory, files and applications -- so you have to learn to organize that data so the systems process it efficiently. Make sure that you keep your master data consistent to avoid creating multiple versions of the truth -- multiple, unsynchronized databases.

Data protection is also important; familiarize yourself with the data security processes within your organization as well as security, compliance and governance processes. Depending on how sensitive the data, consider protecting it with masking, redaction or encryption.

Big data sources defined

Data warehouses

Hadoop

NoSQL

In-memory

File data

Application data

The final step when you learn about big data before this project is to investigate your customer's Quality of Service requirements. How much data do they want to analyze, and how quickly do they need responses? For example, for a large database that needs an almost real-time response, place as much of that data as possible in-memory or on flash cache so that it can be read quickly. IBM's BLU Accelerator and SAP HANA are good examples of these fast read in-memory environments. Also, understand your customer's desired outcome -- the answers they are trying to get. If you know the result they need, you will be able to organize your data and systems to more efficiently reach it.

About the author:
Joe Clabby is the president of Clabby Analytics and has more than 32 years of experience in the IT industry, with positions in marketing, research and analysis. Clabby is an expert in application reengineering services, systems and storage design, data center infrastructure and integrated service management. He has produced in-depth technical reports on various technologies, such as virtualization, provisioning, cloud computing and application design.

Next Steps

How to succeed with big data

Big data projects call for concierge

A pro shares big data advice

This was last published in June 2015

Dig Deeper on Data Center jobs and staffing and professional development

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

2 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What is the best advice for an IT systems analyst working in the data center?
Cancel
I agree that “Depending on how sensitive the data, consider protecting it with masking, redaction or encryption.”

My view is to reach the goal of securing the data while preserving its value for analytics, the data itself must be protected at as fine-grained a level as possible. Securing individual fields allows for the greatest flexibility in protecting sensitive identifying fields while allowing nonidentifying information to remain in the clear.

By protecting data at a very fine-grained level—fields or even part(s) of a field—we can continue to reap the benefits of data monetization while putting forth a significant barrier to data theft. I talk more about this issue here (http://www.sramanamitra.com/2014/10/03/thought-leaders-in-big-data-ulf-mattsson-cto-of-protegrity-part-1/ )

Ulf Mattsson, CTO Protegrity
Cancel

-ADS BY GOOGLE

SearchWindowsServer

SearchServerVirtualization

SearchCloudComputing

Close