zhu difeng - Fotolia

What should an IT systems analyst learn about big data?

I am an IT systems analyst working on a big data project. What should I know before I dive into the project?

The best way to learn big data principles is to start with how information systems work, particularly databases and infrastructure. It's also important to cover big data tools like Cloudera, Hadoop, Spark, Hive, Pig, Flume, Sqoop and Mesos before you start.

A systems analyst should understand how to organize, manage and protect data. There are dozens upon dozens of data management products available on the market to help organize and manage data. Your big data database may comprise structured and unstructured data from various sources -- data warehouses, Hadoop, NoSQL, in-memory, files and applications -- so you have to learn to organize that data so the systems process it efficiently. Make sure that you keep your master data consistent to avoid creating multiple versions of the truth -- multiple, unsynchronized databases.

Data protection is also important; familiarize yourself with the data security processes within your organization as well as security, compliance and governance processes. Depending on how sensitive the data, consider protecting it with masking, redaction or encryption.

Big data sources defined

Data warehouses




File data

Application data

The final step when you learn about big data before this project is to investigate your customer's Quality of Service requirements. How much data do they want to analyze, and how quickly do they need responses? For example, for a large database that needs an almost real-time response, place as much of that data as possible in-memory or on flash cache so that it can be read quickly. IBM's BLU Accelerator and SAP HANA are good examples of these fast read in-memory environments. Also, understand your customer's desired outcome -- the answers they are trying to get. If you know the result they need, you will be able to organize your data and systems to more efficiently reach it.

About the author:
Joe Clabby is the president of Clabby Analytics and has more than 32 years of experience in the IT industry, with positions in marketing, research and analysis. Clabby is an expert in application reengineering services, systems and storage design, data center infrastructure and integrated service management. He has produced in-depth technical reports on various technologies, such as virtualization, provisioning, cloud computing and application design.

Next Steps

How to succeed with big data

Big data projects call for concierge

A pro shares big data advice

Dig Deeper on Data Center jobs and staffing and professional development