Reinforcement Learning

Course objectives

General Objectives. The Reinforcement Learning (RL) course aims to introduce students to fundamental and advanced techniques of RL, a significant area within artificial intelligence and machine learning. Students will gain skills to design and implement algorithms that enable systems to learn and improve autonomously through experience, optimizing their decisions in real-time. Specific Objectives. Students will explore key concepts of RL such as decision policies, Markov Decision Processes, Q-learning, and deep reinforcement learning. They will learn to: Model complex problems using the RL approach. Develop and implement algorithms like Q-learning and Deep Q-Networks (DQN). Apply RL techniques in real-world scenarios like robotics, gaming, etc. Knowledge and Understanding: In-depth knowledge of basic and advanced RL algorithms. Understanding of reward-based learning models and their practical applications. Ability to interpret the results of RL algorithms and evaluate their effectiveness in various contexts. Applying Knowledge and Understanding: Use software frameworks like TensorFlow or PyTorch to implement and test RL algorithms. Analyze current research case studies and projects to understand real-world RL applications. Develop functional prototypes using RL to solve specific problems. Autonomy of Judgment: Students will develop the ability to critically assess RL algorithms, considering their applicability, efficiency, and potential biases. They will also be able to select the most appropriate algorithm for a given problem. Communication Skills: Students will learn to effectively communicate RL concepts, algorithm design decisions, and outcomes to both technical and non-technical audiences using a variety of communication media. Next Study Abilities: This course will prepare students to pursue advanced studies and research in RL, providing the necessary foundation to tackle open problems and innovate in the field. Students will be encouraged to actively contribute to the scientific community through publications, conferences, and collaborations.

Channel 1
Roberto Capobianco Lecturers' profile

Program - Frequency - Exams

Course program
Overview and course logistics. Exploration in bandits with regret analysis; contextual bandits. Markov Decision Processes. Dynamic programming methods: value iteration and policy iteration. Linear–Quadratic control and trajectory optimization: LQR, iLQR, and MPC. Approximate dynamic programming with approximate policy iteration. Model-free control: Q-learning and SARSA; n-step bootstrapping. Value-function approximation with linear methods. Off-policy learning and Deep Q-Networks (DQN). Model-based reinforcement learning. Policy search methods: REINFORCE, baselines, and variance reduction. KL-regularized and trust-region policy optimization (e.g., TRPO). Imitation learning and inverse reinforcement learning. Multi-agent reinforcement learning. Open-ended reinforcement learning and selected advanced topics.
Prerequisites
This is a math-heavy course, with focus on algorithm design and analysis. For this reason, we require students to be comfortable about basics of calculus, probability and linear algebra. A previous Machine Learning background is strongly recommended but not required. Since practicals and assignments consist of programming problems, we expect ALL students to be able to implement algorithmic ideas in code and to be proficient in Python programming. There is a tutorial here (http://cs231n.github.io/python-numpy-tutorial/) for those who are not familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Javascript) you will probably be fine.
Books
Theory: * Reinforcement Learning: An Introduction, Sutton and Barto: PDF http://incompleteideas.net/book/RLbook2020.pdf, online resources http://incompleteideas.net/book/the-book.html * Reinforcement Learning: Theory and Algorithms Alekh Agarwal, Nan Jiang, Sham Kakade, Wen Sun: PDF https://rltheorybook.github.io/rltheorybook_AJKS.pd Practical: * Basic Python: http://cs231n.github.io/python-numpy-tutorial/ * OpenAI Gym Documentation: https://www.gymlibrary.dev/
Frequency
In class and remote, when needed
Exam mode
Assessment structure (95% total): * Either 3 assignments (15% each, total 45%) or a written exam (45%). * Final project: 50%. * Max grade follows the Italian system (30 cum laude / 33). Bonus (5%): Awarded for class interaction/engagement or a valid contribution to the course repo (pre-approved by the TAs). Validity: Assignments/written exam remain valid through the last exam session of the academic year (Sept 2026). Engagement tracking: In-class Q&A/participation and approved repo contributions are tracked and can earn the bonus.
Lesson mode
Schedule: * Tuesday 8:00 - 10:00 (practical), Room B2, Via Ariosto * Friday 08:00 - 11:00 (theory), Room 201, Regina Elena Office Hours: Wednesday, 10:30am-12:30pm
  • Lesson code10606827
  • Academic year2025/2026
  • CourseArtificial Intelligence and Robotics
  • CurriculumSingle curriculum
  • Year2nd year
  • Semester1st semester
  • SSDING-INF/05
  • CFU6