zhu difeng - Fotolia
The best way to learn big data principles is to start with how information systems work, particularly databases and infrastructure. It's also important to cover big data tools like Cloudera, Hadoop, Spark, Hive, Pig, Flume, Sqoop and Mesos before you start.
A systems analyst should understand how to organize, manage and protect data. There are dozens upon dozens of data management products available on the market to help organize and manage data. Your big data database may comprise structured and unstructured data from various sources -- data warehouses, Hadoop, NoSQL, in-memory, files and applications -- so you have to learn to organize that data so the systems process it efficiently. Make sure that you keep your master data consistent to avoid creating multiple versions of the truth -- multiple, unsynchronized databases.
Data protection is also important; familiarize yourself with the data security processes within your organization as well as security, compliance and governance processes. Depending on how sensitive the data, consider protecting it with masking, redaction or encryption.
The final step when you learn about big data before this project is to investigate your customer's Quality of Service requirements. How much data do they want to analyze, and how quickly do they need responses? For example, for a large database that needs an almost real-time response, place as much of that data as possible in-memory or on flash cache so that it can be read quickly. IBM's BLU Accelerator and SAP HANA are good examples of these fast read in-memory environments. Also, understand your customer's desired outcome -- the answers they are trying to get. If you know the result they need, you will be able to organize your data and systems to more efficiently reach it.
About the author:
Joe Clabby is the president of Clabby Analytics and has more than 32 years of experience in the IT industry, with positions in marketing, research and analysis. Clabby is an expert in application reengineering services, systems and storage design, data center infrastructure and integrated service management. He has produced in-depth technical reports on various technologies, such as virtualization, provisioning, cloud computing and application design.
How to succeed with big data
Big data projects call for concierge
A pro shares big data advice
Dig Deeper on Data Center jobs and staffing and professional development
Related Q&A from Joe Clabby
I'm purchasing rack-level access switches for an enterprise data center, and weighing Cisco vs. Mikrotik, which is less expensive. Continue Reading
Can administrators set IBM mainframe jobs to update Oracle databases directly, or are there incompatibilities between the systems? Continue Reading