CPS824/CP8319: Reinforcement Learning
Course Management Form
Instructor: |
Mikhail Soutchanski |
Email: |
mes (at) cs (dot) torontomu (dot) ca
(write RL in Subject of your email) |
Web page: |
www.cs.torontomu.ca/~mes/courses/cps824 |
Office: |
The Centre for Computing and Engineering,
ENG275 |
Office Hours: |
Monday, 3-4pm (by appointment only)
Tuesday, 11am-11:30 (every 2nd week starting from Jan 24th)
|
TA: |
Raiyan Rahman (email: raiyan.rahman (at) torontomu.ca) |
Lectures: |
|
Section |
Status |
Day |
Start Time |
End Time |
Room |
All |
Available |
Tuesday |
15:10 |
17:00 |
EPH-142 |
Wednesday |
10:10 |
11:00 |
VIC-203 |
Course Description
-
This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to
learning from interaction to achieve goals in stochastic and deterministic environments. Reinforcement
learning has adapted key ideas from machine learning, operations research, control theory, psychology,
and neuroscience to produce some strikingly successful engineering applications. The focus
is on algorithms that learn what actions to take, and when to take them, so as to optimize
long-term performance. This may involve sacrificing immediate reward to obtain greater reward
in the long-term or just to obtain more information about the environment. The course
will cover Markov decision processes, dynamic programming, temporal-difference learning,
Monte Carlo reinforcement learning methods, function approximation methods, and
the integration of learning and planning. The course covers some of the key approaches
underlying the success of the modern computer programs that can defeat human professional
players in the game of Go and other classic games. A number of applications of
reinforcement learning will be discussed as well. The focus is mostly on cases characterized by discrete
finite probability distributions and for this reason requires minimal background
in probability theory that is briefly reviewed in the beginning of this course.
-
Prerequisites:
The course requires ability to write computer programs in one of the modern programming
languages such as C/C++, Java or Python, basics of data structures (CPS305 or equivalent)
as well as basic probability theory (MTH380 or equivalent).
Do not enroll into this course if you cannot write computer programs.
-
Compulsory Text Book:
R. S. Sutton and Andrew Barto
Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 2nd edition, 2018.
The students are expected to read sections and chapters from this textbook each week.
You might wish to browse the older
1st edition (1998).
(Clicking on the link will take you to Professor Richard Sutton's personal Web page.)
The
Second Edition is also published by the MIT Press, Nov 2018,
ISBN 9780262039246.
-
Extra References (not required)
-
Hector Geffner, Blai Bonet
``A Concise Introduction to Models and Methods for Automated Planning",
Chapter 6. Morgan and Claypool Publishers, 2013.
Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 7, No. 2.
June 2013. Available online from the TMU Library.
-
Dimitri Bertsekas textbook
"Reinforcement Learning and Optimal Control",
Athena Scientific, 1st edition, 2019.
-
Dimitri Bertsekas
``Dynamic Programming and Optimal Control".
Athena Scientific; 4th edition, 2012, volume 2, Chapter 6
Approximate Dynamic Programming, a draft from November 11, 2011.
(This book is more advanced than what is required in this course.
It is optional reading for graduate students).
-
Evaluation:
4 assignments (10% each): worth a total of 40% of the final grade.
Midterm: 20%. Final exam: 40%.
Graduate students may be asked to do additional work on assignments and tests.
In particular, graduate students will be asked to complete a small project
as part of their 4th assignment.
Undergraduate students can earn bonus marks for doing extra work.
To complete the 4th assignment, the students may be asked to prepare slides for
a 20-30min talk on a topic related to the course, and present their talk in class.
-
Brief Description
This course focuses on topics related to reinforcement learning.
The course will cover an n-armed bandit problem,
making multiple-stage decisions under uncertainty,
Markov decision processes,
dynamic programming,
Monte Carlo reinforcement learning methods,
temporal-difference learning including Q-learning (off-policy control) and SARSA (on-policy control),
eligibility traces,
function approximation methods,
and the integration of learning and planning
including DYNA architecture, prioritized sweeping, real-time dynamic programming and heuristic search.
Course Policies
-
The students are strongly encouraged to take notes in class, and study their notes
after class. Learning can be a gradual process that requires time and efforts.
The students benefit from attending lectures since some important details
will be discussed only there. For this reason, attending lectures is mandatory.
Some of the announcements and clarifications mentioned in class will not be
communicated by any other means. It is your responsibility to find the news
mentioned in class, if you missed a class.
-
Electronic devices: turn off your mobile phones and all other electronic devices in class.
You can keep your laptop or tablet open only if you use it to take notes in class.
-
Examinations:
The midterm test, and the
final exam may include short essay and yes/no questions, as well
as problem solving (but not programming questions).
The duration of these examinations will be 1h30min,
and 2h30 minutes, respectively.
There will be no supplemental examinations.
The final exam will be cumulative and will include all the material covered throughout
the term. Grades are earned for the demonstration of knowledge.
If you miss a midterm test, or a final exam for medical reasons, you have to submit an
academic consideration request and hand in a hard copy of a completed official
Health Certificate to the department of Computer Science within 3 working days.
You have to bring your documents yourself to the CS front reception desk.
Once the Program Department has verified the student’s health documentation,
the instructor will be notified of the verification. Similarly,
all documentation related to special accomodation or academic consideration
should be submitted to the CS program office within the specified time limits.
-
Dates are subject to change, all changes will be announced in class and
on the Web course shell.
-
Assignments should be submitted on or before the deadline
specified in the assignment
(you are encouraged to submit assignments earlier).
Your assignment is considered late if any part of the assignment is late
(even if it is just 1 minute late). The penalty for a late assignment is 10% off.
No assignments will be accepted if more than 24 hours late.
Start solving your assignment on the same day when it is posted. Do not procrastinate.
No make-up assignments. Late assignments: if a printout is required, then hand in
your printout in person to a secretary at the CS reception desk and ask her/him
to put a stamp on your assignment to confirm that you handed in your assignment in time.
Send email to the TA who is responsible for marking this assignment:
inform that a hard copy of your assignments is available from the front desk.
-
From time to time, I will hand out exercises.
The students are expected to solve the exercises, but
they will not be graded. However, working on exercises
will improve your understanding of this course
(and will help you to get better marks on tests).
-
Up to 5% (or less) extra credit may be assigned for active class participation
throughout the term, e.g., a student attends classes and takes notes of the lectures,
participates actively by asking/answering questions, solves exercises in class.
Class participation marks are earned for active course participation and
given at discretion of the course instructor; they cannot be requested by the students.
Unexplained lack of attendance can negatively affect one's grade.
-
Handouts and assignments will be made available on the Web only.
You are responsible for visiting
the course Web pages regularly and reading assignments and tests related information
that is provided or linked from these Web pages. In particular, Frequently Answered
Questions (FAQs) related to home work can be linked from there.
These FAQs are considered to be an integral part of the assignment.
Before sending your questions by e-mail to the instructor, check these Web pages
whether similar questions have been already answered.
-
Grades for tests and assignments will be posted on my.torontomu.ca Web site
no later than two weeks after the due date (test date).
Marking guides, the assignments and some other course related documents will be also posted on
my.torontomu.ca only. Graded work will be usually returned to students within two weeks.
If an electronic copy of the assignment was marked by a TA using a script, in this case hard copies
will not be normally returned. The lead partner who submitted an assignment from a team will receive
an email message from the TA who was marking the assignment. This email message will include the mark
for the assignment and brief explanations when and why penalties for errors were applied.
All other team members have to contact their team leader to get feedback about their assignment.
Policy on collaboration in homework assignments
Limited collaboration in discussing general approaches to problems
is allowed (only with one other student); no collaboration is allowed
between teams. You may discuss assignments only with one another student
currently taking the course.
However, you should never put your name on anything
you do not understand.
If challenged,
you must be able to reproduce and explain all solutions by yourself.
If you cannot explain a solution that you handed in, or if you cannot solve
an exercise similar to questions in your home work, this will negatively affect
your grade. In particular, you might be asked to solve exercises during the office hours,
or in class (as a quiz). These unscheduled tests or evaluations
can be given at any time without prior notice. Remember that if you work with partners,
you are still expected to know solutions of all exercises from the home work. Grades are
earned for the demonstration of knowledge. In cases when a student fails to demonstrate
knowledge about a home work, the grade for the home work can be decreased to 0.
The first page of your homework should include: the name of all
students with whom you discussed any homework problems (even briefly).
Otherwise, it is assumed that you didn't discuss with anyone except the instructor.
Copied work (both original and copies) will be graded as 0.
Involvement with plagiarism will be penalized in accordance with Academic Policy 60.
Additional penalty for copied work may be assigned as deterrence against plagiarism.
More specifically, additional penalty for a copied assignment (in part or in whole)
can be up to -10% of the final course grade.
Contract Cheating Statement
In regard to any and all assessments in this course, the use of Chegg or any other
similar help site/service will be pursued as "contract cheating".
In regard to any and all assessments in this course, the use of any third party
(e.g., family member, freelancer, roommate, friend, tutor) to complete work
on your behalf will be pursued as "contract cheating"
under Policy 60 "Academic Integrity".
Policy 60 Penalty Guidelines for contract cheating (e.g., viewing a solution
on Chegg or Discord) that only impacts you: F in course.
Policy 60 Penalty Guidelines for contract cheating that facilitates cheating
for others (e.g., posting a question to Chegg): Disciplinary Suspension.
ACADEMIC MISCONDUCT
Committing academic misconduct, such as plagiarism and cheating,
will trigger academic penalties including failing grades,
suspension and possibly expulsion from the University.
As a TMU student, you are responsible for familiarizing yourself
with the
Student Code of Academic Conduct.
ACADEMIC CONDUCT
The students are expected to pay attention to a lecture and volunteer to answer
instructor's questions during the class-time. In the case of in-person classes,
in order to create an environment conducive to learning and respectful of
others rights, phones and pagers must be silenced during lectures, and evaluations.
Students should refrain from disrupting the lectures
by arriving late and/or leaving before the lecture is finished.
Policy on Non-Academic Conduct
No disruption of instructional activities is allowed. Among many other infractions,
the Code specifically refers to the following as a violation:
``Disruption of Learning and Teaching - Students shall not behave in disruptive ways
that obstruct the learning and teaching environment." In particular, the students can
use the laptops (and similar electronic devices) in class only for taking notes.
In difficult cases, penalties can be imposed by the Student Conduct Officer.
You can read the TMU
Senate Policy 61 for details.
Remarking Policy
- Grades are earned for the demonstration of knowledge.
-
Read carefully the marking guide for the assignment or test you'd like to be remarked.
Your grade may go up, down, or remain the same.
-
Fill in this
remarking form (available online).
-
Email the form and your assignment/test to TA who marked your homework.
-
If you are not satisfied with the TA's remarking, you can appeal
to the instructor.
-
You may not submit a remarking request later than ONE WEEK from the
date on which the assignments/tests were returned in class.
It's your responsibility to pick up your work ASAP.
-
Your mark can decrease if TA sees something that was incorrectly
awarded too high a mark.
Tentative Course Calendar
(all changes of dates will be announced)
Course Work |
Due Date |
Grade Value (%) |
Assignment 1 |
January 30
|
10
|
Assignment 2 |
February 21
|
10
|
Midterm |
Monday, February 27, in-class
|
20
|
Assignment 3 |
March 20
|
10
|
Assignment 4 |
March 27, 10am
|
10
|
Final Exam |
Thur, Apr 20, 2023 @15:00EDT in ENG206
|
40
|
|
|
100
|