CPS824/CP8319: Reinforcement Learning
Course Management Form
Instructor: |
Mikhail Soutchanski |
Email: |
mes (at) cs (dot) torontomu (dot) ca
(write RL in Subject of your email) |
Web page: |
www.cs.torontomu.ca/~mes/courses/cps824 |
Office: |
The Centre for Computing and Engineering,
ENG275 |
Office Hours: |
Wednesday, 11am-noon (by appointment only)
Tuesday, 14-14:30 (every 2nd week starting from Jan 14th)
|
TA: |
NAME TBA (email: (at) torontomu.ca) |
Lectures: |
|
Section |
Status |
Day |
Start Time |
End Time |
Room |
All |
Available |
Tuesday |
15:10 |
17:00 |
EPH-142 |
Wednesday |
10:10 |
11:00 |
VIC-203 |
Course Description
-
This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to
learning from interaction to achieve goals in stochastic and deterministic environments. Reinforcement
learning has adapted key ideas from machine learning, operations research, control theory, psychology,
and neuroscience to produce some strikingly successful engineering applications. The focus
is on algorithms that learn what actions to take, and when to take them, so as to optimize
long-term performance. This may involve sacrificing immediate reward to obtain greater reward
in the long-term or just to obtain more information about the environment. The course
will cover Markov decision processes, dynamic programming, temporal-difference learning,
Monte Carlo reinforcement learning methods, function approximation methods, and
the integration of learning and planning. The course covers some of the key approaches
underlying the success of the modern computer programs that can defeat human professional
players in the game of Go and other classic games. A number of applications of
reinforcement learning will be discussed as well. The focus is mostly on cases characterized by discrete
finite probability distributions and for this reason requires minimal background
in probability theory that is briefly reviewed in the beginning of this course.
-
Prerequisites:
The course requires ability to write computer programs in one of the modern programming
languages such as C/C++, Java or Python, basics of data structures (CPS305 or equivalent)
as well as basic probability theory (MTH380 or equivalent).
Do not enroll into this course if you cannot write computer programs.
-
Compulsory Text Book:
R. S. Sutton and Andrew Barto
Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 2nd edition, 2018.
The students are expected to read sections and chapters from this textbook each week.
You might wish to browse the older
1st edition (1998).
(Clicking on the link will take you to Professor Richard Sutton's personal Web page.)
The
Second Edition is also published by the MIT Press, Nov 2018,
ISBN 9780262039246.
-
Extra References (not required)
-
Hector Geffner, Blai Bonet
``A Concise Introduction to Models and Methods for Automated Planning",
Chapter 6. Morgan and Claypool Publishers, 2013.
Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 7, No. 2.
June 2013. Available online from the TMU Library.
-
Dimitri Bertsekas textbook
"Reinforcement Learning and Optimal Control",
Athena Scientific, 1st edition, 2019.
-
Dimitri Bertsekas
``Dynamic Programming and Optimal Control".
Athena Scientific; 4th edition, 2012, volume 2, Chapter 6
Approximate Dynamic Programming, a draft from November 11, 2011.
(This book is more advanced than what is required in this course.
It is optional reading for graduate students).
-
Evaluation:
4 assignments (10% each): worth a total of 40% of the final grade.
Midterm: 20%. Final exam: 40%.
Graduate students may be asked to do additional work on assignments and tests.
In particular, graduate students will be asked to complete a small project
as part of their 4th assignment.
Undergraduate students can earn bonus marks for doing extra work.
To complete the 4th assignment, the students may be asked to prepare slides for
a 20-30min talk on a topic related to the course, and present their talk in class.
-
Brief Description
This course focuses on topics related to reinforcement learning.
The course will cover an n-armed bandit problem,
making multiple-stage decisions under uncertainty,
Markov decision processes,
dynamic programming,
Monte Carlo reinforcement learning methods,
temporal-difference learning including Q-learning (off-policy control) and SARSA (on-policy control),
eligibility traces,
function approximation methods,
and the integration of learning and planning
including DYNA architecture, prioritized sweeping, real-time dynamic programming and heuristic search.
Course Policies
-
To pass the course the following is required:
- At least 50% must be achieved on the theoretical component
(the weighted total of the midterm test, and final exam marks)
- At least a 50% grade on the remaining practical component:
the weighted total of the homework assignments and in-class presentation
-
The students are strongly encouraged to take notes in class, and study their notes
after class. Learning can be a gradual process that requires time and efforts.
The students benefit from attending lectures since some important details
will be discussed only there. For this reason, attending lectures is mandatory.
Some of the announcements and clarifications mentioned in class will not be
communicated by any other means. It is your responsibility to find the news
mentioned in class, if you missed a class.
-
All course materials posted on D2L and presented in class are copyrighted
and protected by law. You cannot share them with anyone.
You cannot repost them anywhere on the Web.
Please review the
parts of this policy online related to "Academic misconduct".
-
The policy for in-person content delivery. The students are expected to pay attention
to a lecture and volunteer to answer instructor's questions during the class-time.
The students might be asked to participate in unannounced polls or quizzes.
Turn off your mobile phones and all other electronic devices in class.
You can keep your laptop or tablet open only if you use it to take notes in class.
-
Examinations:
The midterm test, and the
final exam may include short essay and yes/no questions, as well
as problem solving (but not programming questions).
The duration of these examinations will be 1h30min,
and 2h30 minutes, respectively.
There will be no supplemental examinations.
The final exam will be cumulative and will include all the material covered throughout
the term. Grades are earned for the demonstration of knowledge.
-
If you miss a midterm test, or a final exam for medical reasons, you have to read Policy 167
Academic Consideration>
and submit a copy of a completed
official
Health Certificate to the designated contact person within 3 working days.
Once the submitted student’s health documentation has been verified
the instructor will be notified of the verification.
Similarly, all documentation related to special accomodation or
academic consideration>
should be submitted online to the designated contact person within the specified time limits.
-
Assignments should be submitted on or before the deadline
specified in the assignment
(you are encouraged to submit assignments earlier).
Your assignment is considered late if any part of the assignment is late
(even if it is just 1 minute late). The penalty for a late assignment is 10% off.
No assignments will be accepted if more than 24 hours late.
Start solving your assignment on the same day when it is posted. Do not procrastinate.
No make-up assignments. Late assignments: if a printout is required, then hand in
your printout in person to a secretary at the CS reception desk and ask her/him
to put a stamp on your assignment to confirm that you handed in your assignment in time.
Send email to the TA who is responsible for marking this assignment:
inform that a hard copy of your assignments is available from the front desk.
-
From time to time, I will hand out exercises.
The students are expected to solve the exercises, but
they will not be graded. However, working on exercises
will improve your understanding of this course
(and will help you to get better marks on tests).
-
Up to 4% (or less) extra credit may be assigned for active class participation
throughout the term, e.g., a student attends classes and takes notes of the lectures,
participates actively by asking/answering questions, solves exercises in class.
Class participation marks are earned for active course participation and
given at discretion of the course instructor; they cannot be requested by the students.
Unexplained lack of attendance can negatively affect one's grade.
-
Handouts and assignments will be made available on the Web only.
You are responsible for visiting
the course Web pages regularly and reading assignments and tests related information
that is provided or linked from these Web pages. In particular, Frequently Answered
Questions (FAQs) related to home work can be linked from there.
These FAQs are considered to be an integral part of the assignment.
Before sending your questions by e-mail to the instructor, check these Web pages
whether similar questions have been already answered.
- Email communication: you can send email from local TMU's
email addresses only: you can use either your departmental account
(preferred) or your university account to send email. Email sent from Google, Bell, Rogers
and any other external email domains can be filtered out as spam and might not
reach the instructors. Email messages will be normally answered within 24 hours.
However, messages sent on weekend (starting from Friday afternoon) will be
usually answered on Monday.
-
Grades for assignments and tests will be normally
posted on D2L Web site
no later than two weeks after the due date (exam date).
Marking guides, the assignments and
some other course related documents will be posted on
D2L only.
Feedback will be usually provided to students within two weeks.
The students can contact the TA who was responsible for marking,
if they have questions about marking, or attend the office hour.
Policy on collaboration in homework assignments
Collaboration in discussing general approaches to problems
is allowed only with students in your team. No collaboration is allowed
between teams. You may discuss assignments only with other people
currently taking the course.
However, you should never put your name on anything you do not understand.
If challenged,
you must be able to reproduce and explain all solutions by yourself,
or solve similar exercises. If you cannot explain a solution that
you handed in, or if you cannot solve an exercise similar to questions
in your home work or in your quiz, this will negatively affect your grade. In
particular, you might be asked to solve extra exercises during the office
hours, during one of the labs, or in class (as a quiz). These unscheduled
tests or evaluations can be given at any time without prior notice.
Remember that if you work with partners,
you are still expected to know solutions of all exercises from the home
work. Grades are earned for the demonstration of knowledge.
In cases when a student fails to demonstrate knowledge about a
home work, the grade for the home work can be decreased to 0.
The first page of your homework should include: the name of all
students with whom you discussed any homework problems (even briefly).
Otherwise, it is assumed that you didn't discuss with anyone except the
instructor. Copied work (both original and copies) will be graded as 0.
Involvement with plagiarism will be penalized in accordance with Academic
Policy 60. Additional penalty for copied work may be assigned as deterrence
against plagiarism. More specifically, additional penalty for a copied
assignment (in part or in whole) can be up to -5% of the final course grade.
Contract Cheating Statement
In regard to any and all assessments in this course, the use of Chegg,
or any other similar help site/service/tool will be pursued as "contract cheating".
The use of ChatGPT, CoPilot, Gemini and similar generative Large Language Models (LLM) with the purposes
of solving homework problems will be pursued as "a breach of Policy 60: Academic Integrity",
if the student accessed them before submitting course work and assessment
is presented as if it is one’s own original work without appropriate referencing.
Generative LLM tools may only be used for comparison with your own course work that you have already
submitted, but not for the creation of submitted work.
In regard to any and all assessments in this course, the use of any third party
(e.g., family member, freelancer, room-mate, friend, tutor) to complete work
on your behalf will be pursued as "contract cheating"
under Policy 60 "Academic Integrity".
Policy 60 Penalty Guidelines for contract cheating (e.g., viewing a solution
on Chegg or Discord) that only impacts you: F in course.
Policy 60 Penalty Guidelines for contract cheating that facilitates cheating
for others (e.g., posting a question to Chegg): Disciplinary Suspension.
ACADEMIC MISCONDUCT
Committing academic misconduct, such as plagiarism and cheating,
will trigger academic penalties including failing grades,
suspension and possibly expulsion from the University.
As a TMU student, you are responsible for familiarizing yourself
with the
Student Code of Academic Conduct.
ACADEMIC CONDUCT
The students are expected to pay attention to a lecture and volunteer to answer
instructor's questions during the class-time. In the case of in-person classes,
in order to create an environment conducive to learning and respectful of
others rights, phones and pagers must be silenced during lectures, and evaluations.
Students should refrain from disrupting the lectures
by arriving late and/or leaving before the lecture is finished.
Policy on Non-Academic Conduct
No disruption of instructional activities is allowed. Among many other infractions,
the Code specifically refers to the following as a violation:
``Disruption of Learning and Teaching - Students shall not behave in disruptive ways
that obstruct the learning and teaching environment." In particular, the students can
use the laptops (and similar electronic devices) in class only for taking notes.
In difficult cases, penalties can be imposed by the Student Conduct Officer.
You can read the TMU
Senate Policy 61 for details.
Remarking Policy
- Grades are earned for the demonstration of knowledge.
-
Read carefully the marking guide for the assignment or test you'd like to be remarked.
Your grade may go up, down, or remain the same.
-
Fill in this
remarking form (available online).
-
Email the form and your assignment/test to TA who marked your homework.
-
If you are not satisfied with the TA's remarking, you can appeal
to the instructor.
-
You may not submit a remarking request later than ONE WEEK from the
date on which the assignments/tests were returned in class.
It's your responsibility to pick up your work ASAP.
-
Your mark can decrease if TA sees something that was incorrectly
awarded too high a mark.
Tentative Course Calendar
(all changes of dates will be announced)
Course Work |
Due Date |
Grade Value (%) |
Assignment 1 |
January 29
|
10
|
Assignment 2 |
February 20
|
10
|
Midterm |
Tuesday, February 25, in-class
|
20
|
Assignment 3 |
March 19
|
10
|
Assignment 4 |
March 25
|
10
|
Final Exam |
TBA
|
40
|
|
|
100
|