Data sources represent a fundamental part of the big data system. These components are the generators of the data that are collected to provide insights and business value to the companies.
Types of sources
The most common sources of the data can be split into these 4 categories:
- Social media
This category refers to payments, orders, deliveries, storage records. In most situations, it is generated by financial, trading, and eCommerce applications. It is useful for analyzing the market situation, predicting stock prices, and optimizing costs.
This category refers to the activity on social platforms. Examples of this activity are likes, tweets and retweets, comments, posts, video uploads, and general media that are uploaded and shared. It is useful for insights into consumer preferences, changing trends, and marketing analytics.
This category refers to the industrial equipment, sensor measurements, generated in real-time, and the company’s server logs. It is a standard data source in the
IoT solutions. It is useful for analyzing the system performance, tracking the state in real-time, sending notifications, predicting system behavior, and reacting to failure scenarios.
This category refers to users that use the company’s web or mobile client applications. Data can be generated directly by a user, or indirectly by monitoring user behavior. It is useful for adapting ads and recommendation engines.
Data from sources can come in multiple forms, formats. Format refers to the structure and organization. It can be split into 4 categories, categories are put in decreasing order of the occurrence in collected datasets:
Most of the generated data in the big data context is unstructured or semi-structured.
Format with predefined data types organized in a fixed structure. Examples: traditional
Format with a self-describing structure that is not fixed. Technically speaking, it has semantic tags or markers which is used to group types of data and enforce hierarchies within the data. Examples:
Format with inconsistent types and values, that has a observable structure pattern that cannot be easily parsed. Examples: web pages, web click-streams, google search results.
Format without data types, that has no inherent, identifiable structure. It cannot be processed and analyzed using the conventional methods. Examples: text documents, email messages,
Sources in our solution
|Front-end service logs||Machine||
|Back-end service logs||Machine||
- IoT - Internet of things
- RDBMS - Relational database management system
- CSV - Comma separated values
- XML - Extensible markup language
- PDF - Portable document format