BIG DATA COMPUTING
Channel 1
DANIELE DE SENSI
Lecturers' profile
Program - Frequency - Exams
Course program
Introduction
- The Big Data Phenomenon
- The Big Data Infrastructure
- Datacenter and their relevance to Big Data workloads
Datacenter Architecture: Compute
- Introduction to GPU, TPU, and other computer architectures
Datacenter Architecture: Network
- Limitations of TCP for high-performance and big data workloads
- RDMA
- Datacenter network topologies
- Congestion control and routing algorithms
- In-network compute: SmartNICs and programmable switches, and use cases involving big data processing
Datacenter Architecture: Storage
- Brief introduction
Big Data Frameworks
- Distributed File Systems (HDFS)
- MapReduce (Hadoop)
- Spark
- PySpark + Google Colaboratory
Unsupervised Learning: Clustering
- Similarity Measures
- Algorithms: K-means
- Example: Document Clustering
Dimensionality Reduction
- Feature Extraction
- Algorithms: Principal Component Analysis (PCA)
- Example: PCA + Handwritten Digit Recognition
Supervised Learning
- Basics of Machine Learning
- Regression/Classification
- Algorithms: Linear Regression/Logistic Regression/Random Forest
- Examples:
- Linear Regression - House Pricing Prediction (i.e., predict the price which a house will be sold)
- Logistic Regression/Random Forest - Marketing Campaign Prediction (i.e., predict whether a customer will subscribe a term deposit of a bank)
Recommender Systems
- Content-based vs. Collaborative filtering
- Algorithms: k-NN, Matrix Factorization (MF)
- Example: Movie Recommender System (MovieLens)
Graph Analysis
- Link Analysis
- Algorithms: PageRank
- Example: Ranking (a sample of) the Google Web Graph
Real-time Analytics
- Streaming Data Processing
- Example: Twitter Hate Speech Detector
Prerequisites
The course assumes that students are familiar with the basics of data analysis, machine learning, computer architecture, and computer networks. These must be properly supported by a strong knowledge of foundational concepts of calculus, linear algebra, and probability and statistics. In addition, students must have non-trivial computer programming skills (preferably using Python programming language).
Books
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
- RDMA Aware Networks Programming User Manual
- Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] (available online)
- Big Data Analysis with Python [Marin, Shukla, VK]
- Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti]
- Spark: The Definitive Guide [Chambers, Zaharia]
- Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia]
- Hadoop: The Definitive Guide [White]
- Python for Data Analysis [Mckinney]
Frequency
Not mandatory.
Exam mode
Oral examination session, covering a project and/or a scientific paper presentation covering some topics seen during the course. The oral exam includes questions on any subject presented during lectures.
Bibliography
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
- RDMA Aware Networks Programming User Manual
- Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] (available online) - Big Data Analysis with Python [Marin, Shukla, VK] - Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti] - Spark: The Definitive Guide [Chambers, Zaharia] - Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia] - Hadoop: The Definitive Guide [White] - Python for Data Analysis [Mckinney]
DANIELE DE SENSI
Lecturers' profile
Program - Frequency - Exams
Course program
Introduction
- The Big Data Phenomenon
- The Big Data Infrastructure
- Datacenter and their relevance to Big Data workloads
Datacenter Architecture: Compute
- Introduction to GPU, TPU, and other computer architectures
Datacenter Architecture: Network
- Limitations of TCP for high-performance and big data workloads
- RDMA
- Datacenter network topologies
- Congestion control and routing algorithms
- In-network compute: SmartNICs and programmable switches, and use cases involving big data processing
Datacenter Architecture: Storage
- Brief introduction
Big Data Frameworks
- Distributed File Systems (HDFS)
- MapReduce (Hadoop)
- Spark
- PySpark + Google Colaboratory
Unsupervised Learning: Clustering
- Similarity Measures
- Algorithms: K-means
- Example: Document Clustering
Dimensionality Reduction
- Feature Extraction
- Algorithms: Principal Component Analysis (PCA)
- Example: PCA + Handwritten Digit Recognition
Supervised Learning
- Basics of Machine Learning
- Regression/Classification
- Algorithms: Linear Regression/Logistic Regression/Random Forest
- Examples:
- Linear Regression - House Pricing Prediction (i.e., predict the price which a house will be sold)
- Logistic Regression/Random Forest - Marketing Campaign Prediction (i.e., predict whether a customer will subscribe a term deposit of a bank)
Recommender Systems
- Content-based vs. Collaborative filtering
- Algorithms: k-NN, Matrix Factorization (MF)
- Example: Movie Recommender System (MovieLens)
Graph Analysis
- Link Analysis
- Algorithms: PageRank
- Example: Ranking (a sample of) the Google Web Graph
Real-time Analytics
- Streaming Data Processing
- Example: Twitter Hate Speech Detector
Prerequisites
The course assumes that students are familiar with the basics of data analysis, machine learning, computer architecture, and computer networks. These must be properly supported by a strong knowledge of foundational concepts of calculus, linear algebra, and probability and statistics. In addition, students must have non-trivial computer programming skills (preferably using Python programming language).
Books
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
- RDMA Aware Networks Programming User Manual
- Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] (available online)
- Big Data Analysis with Python [Marin, Shukla, VK]
- Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti]
- Spark: The Definitive Guide [Chambers, Zaharia]
- Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia]
- Hadoop: The Definitive Guide [White]
- Python for Data Analysis [Mckinney]
Frequency
Not mandatory.
Exam mode
Oral examination session, covering a project and/or a scientific paper presentation covering some topics seen during the course. The oral exam includes questions on any subject presented during lectures.
Bibliography
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
- RDMA Aware Networks Programming User Manual
- Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] (available online) - Big Data Analysis with Python [Marin, Shukla, VK] - Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti] - Spark: The Definitive Guide [Chambers, Zaharia] - Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia] - Hadoop: The Definitive Guide [White] - Python for Data Analysis [Mckinney]
GABRIELE TOLOMEI
Lecturers' profile
GABRIELE TOLOMEI
Lecturers' profile
- Lesson code1041764
- Academic year2025/2026
- CourseElectrical Engineering
- CurriculumElectrical Engineering for Digital Transition and Sustainable Power Systems
- Year1st year
- Semester1st semester
- SSDINF/01
- CFU6