
zhu difeng - Fotolia
What should an IT systems analyst learn about big data?
I am an IT systems analyst working on a big data project. What should I know before I dive into the project?
The best way to learn big data principles is to start with how information systems work, particularly databases and infrastructure. It's also important to cover big data tools like Cloudera, Hadoop, Spark, Hive, Pig, Flume, Sqoop and Mesos before you start.
A systems analyst should understand how to organize, manage and protect data. There are dozens upon dozens of data management products available on the market to help organize and manage data. Your big data database may comprise structured and unstructured data from various sources -- data warehouses, Hadoop, NoSQL, in-memory, files and applications -- so you have to learn to organize that data so the systems process it efficiently. Make sure that you keep your master data consistent to avoid creating multiple versions of the truth -- multiple, unsynchronized databases.
Data protection is also important; familiarize yourself with the data security processes within your organization as well as security, compliance and governance processes. Depending on how sensitive the data, consider protecting it with masking, redaction or encryption.
The final step when you learn about big data before this project is to investigate your customer's Quality of Service requirements. How much data do they want to analyze, and how quickly do they need responses? For example, for a large database that needs an almost real-time response, place as much of that data as possible in-memory or on flash cache so that it can be read quickly. IBM's BLU Accelerator and SAP HANA are good examples of these fast read in-memory environments. Also, understand your customer's desired outcome -- the answers they are trying to get. If you know the result they need, you will be able to organize your data and systems to more efficiently reach it.
About the author:
Joe Clabby is the president of Clabby Analytics and has more than 32 years of experience in the IT industry, with positions in marketing, research and analysis. Clabby is an expert in application reengineering services, systems and storage design, data center infrastructure and integrated service management. He has produced in-depth technical reports on various technologies, such as virtualization, provisioning, cloud computing and application design.
Dig Deeper on Data Center jobs and staffing and professional development
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.
Meet all of our Data Center experts
View all Data Center questions and answers
Join the conversation
2 comments
My view is to reach the goal of securing the data while preserving its value for analytics, the data itself must be protected at as fine-grained a level as possible. Securing individual fields allows for the greatest flexibility in protecting sensitive identifying fields while allowing nonidentifying information to remain in the clear.
By protecting data at a very fine-grained level—fields or even part(s) of a field—we can continue to reap the benefits of data monetization while putting forth a significant barrier to data theft. I talk more about this issue here (http://www.sramanamitra.com/2014/10/03/thought-leaders-in-big-data-ulf-mattsson-cto-of-protegrity-part-1/ )
Ulf Mattsson, CTO Protegrity