Main menu


Big Data Management: Five New Common Senses

Big Data Management: Five New Common Senses

Big Data Management: Five New Common Senses 

As more organizations deploy big data platforms, there is growing concern about the lack of proper data management practices that hinder the development of big data applications. When considering how big data management should be related to big data platforms (such as combining general-purpose hardware with Hadoop), the advent of new technologies requires new tools and processes for data management. 
It is clear that there is. 
In this article, I will introduce five new common senses that you should know about how big data management should be to help ensure the consistency and reliability of analysis results.

1. Part of big data management can be done by business users themselves

One of the features talked about about big data is the widespread use of data. Big data utilization environments assume that you have access to a large number of large datasets stored in their original format. 
Business users today are more tech-savvy than previous generations, so they aren't given integrated and formatted data through operational data stores, data warehouses, and data marts, but data in raw format. 
An increasing number of users want to access and make the necessary preparations themselves. 
These business users want the flexibility to scrutinize data sources, create reports, and perform analytics to suit their unique business needs.
Providing business users with a self-service environment for big data can meet two requirements for big data management.

To enable data exploration for big data management, each user must provide a means to scrutinize the data on their own.
You need to provide a data preparation tool that each user can use on their own so that they can collect the required information from a large number of datasets and pass the results to the analytics function.

2. Big data: management The data model is different from 10 years ago

Traditional approaches to the process of collecting and storing data for reporting and analysis focus on storing data in a defined structure. 
But what is expected in the world of big data management is to capture and store both structured and unstructured datasets in their original (raw) format, that is, predefined data. Avoid using the model also expected in the world of big data management.
The advantage of doing this is that different users can tune their datasets in the way that best suits their needs.

However, to mitigate the risks of inconsistencies and misinterpretations, big data management and  set metadata management needs to be done in a proper way. 
Specifically, by establishing solid procedures for documenting business glossaries, mapping business terms to data elements, and managing the operation of collaborative environments, we relate interpretations and methods when manipulating data for analytical purposes. 
It must be shared by everyone.

3.Big data management:  It is the user's responsibility to determine the quality of the data.

Traditional systems apply data standardization and cleansing before storing data in a predefined model. 
On the other hand, one of the characteristics of the big data utilization environment is that, due to the nature of storing and providing data in its original format, neither cleansing nor standardization is applied when the dataset is collected.

This gives you more freedom in how you use the data, but it is your responsibility to apply the necessary data transformations. 
Therefore, it is easy to use a dataset for different purposes, as long as the user transformations do not conflict with each other. 
Conversely, we need a way to manage different transformations and ensure that they are consistent. 
Therefore, the big data management environment must include a method for collecting the results of data conversion by the user and a method for ensuring that they are consistent and the data interpretation is consistent. 

4.Big data management:  Understand that the architecture improves performance

let's stay in the world of big data management, Big data platforms can leverage processing and storage nodes built with inexpensive general-purpose products to perform parallel computations with distributed storage. 
However, if you don't fully understand the details of SQL-on-Hadoop query optimization and execution models, you can be disappointed with the unexpectedly long response times.

For example, a complex JOIN operation requires a chunk of a distributed dataset to be broadcast to all computing nodes, which is a serious performance issue as it sends large amounts of data to the network. 
The solution is to understand how data is organized in big data architectures and how to optimize queries in database execution models, and with these understandings, you can write big data-powered applications with high performance.

5.Big data management:  The essence of big data utilization lies in the world of streaming

In previous data utilization environments, much of the data collected and used for analytical purposes was generated within the organization and stored in a static data repository. Today, streaming data continues to grow explosively. 
Some of this is human-generated content such as social media channels, blogs, and email, while others are machine-generated data coming from countless sensors, devices, meters, and other internet-connected machines. 
There is also streaming content that is automatically generated from the IT system, such as web event logs. 
All of these streaming sources generate large amounts of data, which is a valuable source of analysis.

And this is the core of the challenges surrounding the use of big data. 
Therefore, big data management strategies must always include technologies that support streaming processes such as scanning / filtering / sorting meaningful information. 

Without big data management, streaming data cannot be properly collected, stored, and subsequently controlled for access.

Big data management considerations

Proper big data management of big data not only effectively leverages many of the traditional approaches to data modeling and architecture, but also provides new technology and process frameworks that enable access and manipulation of a wider range of data. It is essential to introduce it. 

big data management strategies include data exploration, data preparation, self-service data access, collaborative semantic metadata management, data standardization and cleansing, and the introduction of tools to enable stream processing engines,And if you fully understand the "new common sense" shown here, you can dramatically shorten the period from big data utilization efforts to achieving results.