Course program
Distributed systems are a fundamental concept in modern computing, enabling the creation of large-scale, fault-tolerant systems that can provide robust, reliable services to users. This university-level course is designed to provide an in-depth understanding of distributed systems, starting with a brief history and overview of the challenges and motivations that led to their development.
Program:
The course is divided into seven modules, each of which covers a different aspect of distributed systems:
M1: Models, Abstractions, and Basic Concepts
In this module, students will learn about the core models and abstractions used in distributed systems, including processes, communication, traces and execution, and I/O Automata. Students will also study different types of failures, such as crash-stop and Byzantine failures, and learn about event-based programming and its abstraction, specification, and implementation. Finally, students will explore different types of links, including fair-lossy, stubborn, and point-to-point links.
M2: Time in Distributed Systems
This module focuses on time in distributed systems, covering synchronous, asynchronous, and eventually synchronous systems. Students will learn about different clock synchronization techniques, including Christian, Berkeley, and NTP, and study tree-based versus fully distributed algorithms. The module also covers logical clocks and vector clocks and the happened-before relationship. Students will also learn about encapsulating time in failure detectors, leader election, eventual leader election, and distributed system models, such as fail-stop, fail-noisy, and fail-silent systems.
M3: Basic Broadcast Primitives
In this module, students will explore different types of broadcast primitives, including best-effort broadcast, reliable broadcast, uniform reliable broadcast, and causal broadcast.
M4: Shared Memories
This module focuses on shared memories and explores different types of consistency, including regular, sequential consistency, and linearizability. Students will also learn about different types of registers, such as (1, N) register and atomic (1, 1), (1, N), and (N, N) registers. The module also covers the non-composability of sequential consistency and the relationship between consistency properties.
M5: Consensus
This module covers consensus and its specification. Students will explore hierarchical consensus, including uniform and non-uniform consensus. The module also covers FLP, including the proof that is not required, only the statement, and Paxos.
M6: Total Ordering and Replicated State Machine
In this module, students will learn about total order broadcast and replicated state machines, including Raft. The module also covers active replication versus primary backup.
M7: Byzantine Fault Tolerance (BFT)
This module covers the basics of BFT, including Byzantine failures and authenticated channels and crypto assumptions. Students will learn about Byzantine broadcast, including its consistency and reliability. The module also covers synchronous systems, including an impossibility result and the necessity of 3f+1 and King Algorithm.
Tentative course calendar:
The course is designed to be taught over 12 weeks, with five hours of teaching each week. The tentative course calendar is as follows:
Week 1:
Introduction to Distributed Systems
-Brief history;
-Challenges;
-Motivations.
Models, Abstractions, and Basic Concepts (M1)
-Processes, communication, traces and execution, I/O Automata;
-Failures: crash-stop, and Byzantine;
-Event-based programming: abstraction, specification, and implementations;
-Links: fair-lossy, stubborn, point-to-point.
Week 2/3:
Time in Distributed Systems (M2)
-Synchronous, asynchronous, and eventually synchronous systems;
-Clock Synchronization: Christian, Berkley, NTP;
-Clock sync. Tree-based vs Fully-Distributed Algorithms, Local Skew vs Global Skew, The limits of ordering with timestamps
-Logical clocks and Vector Clock, Happened-before relationship.
Week 3/4:
Time in Distributed Systems (M2) continued
-Encapsulating time in failure detectors: P, $\Diamond$-P;
-Leader election and Eventual Leader Election
-Distributed system models: fail-stop, fail-noisy, fail-silent."
Week 3/4:
Basic Broadcast Primitives (M3)
-Best effort broadcast;
-Reliable broadcast;
-Uniform reliable broadcast;
-Causal broadcast.
Week 5:
Shared Memories (M4)
-Consistency: regular, sequential consistency, and linearizability;
-Regular (1,N) register: Message Passing and (1,1)-Regular implementation
-Atomic (1,1), (1,N), and (N,N) registers;
-Non-composability of sequential consistency;
-Relationship between consistency properties;"
Week 7:
Consensus (M5)
-Consensus specification;
-Hierarchical consensus: uniform and non-uniform;
-FLP: proof not required only the statement
-Paxos.
Week 8:
Total Ordering and Replicated State Machine (M6)
-Total Order broadcast
-Replicated State Machine and Raft.
-Active Replication vs Primary Backup
Week 9:
BFT (M7)
-Intro: Byzantine failures.
-Authenticated Channel and crypto assumptions.
-Byzantine Broadcast, consistent and reliable.
Week 10:
BFT (M7) continued
-Consensus: synchronous systems, an impossibility result: the necessity of 3f+1.
-Consensus: synchronous systems: King Algorithm
Week 11:
Recap and Exercises
M1: Models, Abstractions, and Basic Concepts
-Processes, communication, traces and execution, I/O Automata;
-Failures: crash-stop, and byzantine;
-Event based programming: abstraction, specification and implementations;
-Links: fair-lossy, stubborn, point-to-point.
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
M2: Time in Distributed Systems
-Synchronous, asynchronous, and eventually synchronous systems;
-Clock Synchronization: Christian, Berkley, NTP;
- Clock sync. Tree based vs Fully-Distributed Algorithms, Local Skew vs Global Skew, The limits of ordering with timestamps
-Logical clocks and Vector Clock, Happened-before relationship.
-Encapusaliting time in failure detectors: P, $\Diamond$-P;
-Leader election and Eventual Leader Election
-Distributed system models: fail-stop, fail-noisy, fail-silent."
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
M3: Basic Broadcast Primitives
- Best effort broadcast;
- Reliable broadcast;
- Uniform reliable broadcast;
- Causal broadcast.
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
M4: Shared Memories
- Consistency: regular, sequential consistency and linearizability;
- Regular (1,N) register: Message Passing and (1,1)-Regular implementation
- Atomic (1,1), (1,N) and (N,N) registers;
- Non composability of sequential consistency;
-Relationship between consistency properties;"
M5: Consensus
-Consensus specification;
-Hiearchical consensus: uniform and non-uniform;
-FLP: proof not required only the statement
-Paxos.
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
M6: Total Ordering and Replicated State Machine
-Total Order broadcast
-Replicated State Machine and Raft.
-Active Replication vs Primary Backup
M7: BFT
Intro:
- Byzantine failures.
- Authenticated Channel and crypto assumptions.
- Byzantine Broadcast, consistent and reliable.
Consensus:
-synchronous systems: an impossibility result: the necessity of 3f+1 (Not Formal see slides).
-synchronous systems: King Algorithm
-eventually-synchronous system: PBFT"
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
Blockchains:
Permissioned Blockchains: Definition, Hyperledger fabric architecture and example of a smart contract, other blockchains: Corda and Sawtooth
Permssionless Blockchains: Definition, A deep study of the bitcoin network, Pitfails and attacks against permissionless blockchain based on Proof of Work
References: See reading material on https://sites.google.com/view/distributed-systems-2020/lectures
Prerequisites
Basic knowledge of algorithms and complexity: concept of algorithm, complexity measures.
Basic knowledge of computer science and networks: communication protocols, client/server systems.
Basic knowledge of logic: induction, implications, proofs by contradiction.
Books
Introduction to Reliable and Secure Distributed Programming - C. Chacin, R. Guerraoui, and L. Rodrigues. 2011
Bibliografia di riferimento
Distributed Algorithms - N. Lynch. 1998. Distributed Computing - H. Attiya, J.L. Welch. 2004
Distributed Systems 3rd edition - M. Van Steen and A.S. Tanenbaum. The authors give away free copies: https://www.distributed-systems.net/index.php/books/distributed-systems-3rd-edition-2017/
Design and Analysis of Distributed Algorithms- N. Santoro, 2008.
Teaching mode
Mixed mode on zoom and physically
Frequency
Not Mandatory
Exam mode
The evaluation is a written exam that tests the student’s knowledge and understanding of the fundamental concepts and principles of distributed systems, as well as their ability to apply them to solve simple problems. The exam consists of multiple-choice questions, short-answer questions, and problem-solving exercises.
The sufficient mark is 18/30 the highest possible mark is 30/30, the laude is awarded in special cases (complete mastery of the discipline).
The marks (18-21) are given to students that show a sufficient knowledge and understanding of fundamentals concepts and principle, but that have difficulties to apply these concepts in a novel way in problem solving exercises.
The marks (22-24) are given to students that show a good knowledge and understanding of fundamentals concepts and principle, but that can sufficiently apply these concepts in a novel way in problem solving exercises.
The marks (25-27) are given to students that show a good or complete knowledge and understanding of fundamentals concepts and principle, but that can apply these concepts in a novel way in problem solving exercises with minimal mistakes.
The marks (27-29) are given to students that show a complete knowledge and understanding of fundamentals concepts and principle, but that can apply these concepts in a novel way in problem solving exercises without mistakes.
The marks (30-30 Laude) are given to students that show a complete knowledge and understanding of fundamentals concepts and principle, but that can apply these concepts in a novel way in problem solving exercises without mistakes and with an elegant and efficient methodology. The Laude is awarded only to students that show a complete and total mastery of the discipline during an additional oral examination reserved to students that reach the mark of 30 in the written exam.
Bibliography
Distributed Algorithms - N. Lynch. 1998.
Distributed Computing - H. Attiya, J.L. Welch. 2004
Distributed Systems 3rd edition - M. Van Steen and A.S. Tanenbaum. The authors give away free copies: https://www.distributed-systems.net/index.php/books/distributed-systems-3rd-edition-2017/
Design and Analysis of Distributed Algorithms- N. Santoro, 2008.
Lesson mode
The teaching is divided as divided: 70% of lectures explaining algorithms and concepts. The explanation of each algorithm is done with an interactive approach in which the algorithm is iteratively refined until the correct form using the inputs of the students. 30% of the teaching is devoted to problem solving and exercises.