Big data is a concept that deals with data sets of extreme volumes. Questions may tend to be related to infrastructure, algorithms, statistics, and data structures.
Big data is a concept that deals with data sets of extreme volumes.
There are several features that allow to separate this concept into a distinct one:
Data
- Data is so large it cannot be processed on a single computer
- Relationship between data elements is extremely complex
Algorithms
- Local algorithms that take longer than O(N) to compute will likely to take many years to finish
- Fast distributed algorithms are used instead
Storage
- Underlying data storage shall be fault-tolerant and keep data in a consistent state independently of device failures
- One storage device is incapable of holding all the data set
Eco-system
- Big data is also synonymous with the set of tools which are used to process huge amounts of data. This is also known as big data eco-system. Popular tools are HDFS, Spark, MapReduce etc