Big Data, Data Analysis and Data Mining

About ArtIStE and Big Data, Data Analysis and Data Mining

What is Big Data?

“Data whose scale, diversity and complexity require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it”

Generation
1) Passive recording
▪Typically structured data
▪Bank trading transactions, shopping records, government sector archives
2) Active generation
▪Semistructured or unstructured data
▪User-generated content, e.g., social networks
3) Automatic production
▪Location-aware, context-dependent, highly mobile data
▪Sensor-based Internet-enabled devices

Acquisition
1) Collection
▪Pull-based, e.g., web crawler
▪Push-based, e.g., video surveillance, click stream
2) Transmission
▪Transfer to data center over high capacity links
3) Preprocessing
▪Integration, cleaning, redundancy elimination

Storage
1)Storage infrastructure
▪Storage technology, e.g., HDD, SSD
▪Networking architecture, e.g., DAS, NAS, SAN
2)Data management
▪File systems (HDFS), key-value stores (Memcached), column-oriented databases (Cassandra), document databases (MongoDB)
3)Programming models
▪Map reduce, stream processing, graph processing

Analysis
1)Objectives
▪Descriptive analytics, predictive analytics, prescriptive analytics
2)Methods
▪Statistical analysis, data mining, text mining, network and graph data mining
▪Clustering, classification and regression, association analysis
3)Diverse domains call for customized techniques

Big Data Challenges:

Technology and infrastructure
- New architectures, programming paradigms and techniques are needed


Data management and analysis
- New emphasis on “data”
- Data science