Identification and investigation of data characteristics (BA, MA, WIP)

Internal advisor: Dr.-Eng. Matthias Volk

Since the beginning of this decade, big data has evolved as a promising term to realize data intensive projects and applications, which seemed impossible years ago. Using new technologies and techniques massive amounts of differently structured data, taken from various sources, can be easily stored, processed and visualized. Especially enterprises are aware about its potentials at several stages, such as the improvement of knowledge generation, organizational agility, business process, and competitive performance.

However, for an enterprise, a big data project is a huge investment that could lead to a crucial change on various levels, especially in terms of the infrastructural part. Thus, introducing a big data project is a strategic decision that needs to be made based on solid understanding.

The basis for the implementation of this kind of projects is formed by the processed data. Technological decisions often have to be made upon the data characteristics and their severity, due to their high dependencies. Thus, every application scenarios needs a combination of specific technologies and tools. In order to ensure a decision support during the selection process, it is not only required to provide a certain overview about currently existing data characteristics. Moreover the severity should be considered as well.

The main goal of this work aims to provide both, an overview of the currently existing big data characteristics as well as suitable metrics for their definition. For instance the volume, as one of the main V´s, can be described by the size of the file, the amount of the files, or both. In the future, it is planned to use the results of this investigation to enhance the effectiveness of the right technology selection.
The scope may vary depending on the selected form.