2026/27 Taught Postgraduate Module Catalogue

OCOM5252M Reinforcement Learning and Modern learning Paradigms

15 Credits Class Size: 100

Module manager: Dr Abdulrahman Altahhan
Email: A.Altahhan@leeds.ac.uk

Taught: 1 Jan to 28 Feb, 1 Jan to 28 Feb (adv year), 1 Jul to 31 Aug View Timetable

Year running 2026/27

Pre-requisite qualifications

None

Module replaces

N/A

This module is not approved as an Elective

Module summary

This module introduces the principles and methods of reinforcement learning, focusing on how intelligent agents learn to make sequences of decisions through interaction with their environment. It examines how experience, feedback, and exploration guide learning and adaptation over time. Students develop an understanding of how reinforcement learning supports autonomous behaviour and gain practical experience in designing and training agents capable of acting, improving, and generalising across a range of dynamic tasks.

Objectives

This module aims to develop both conceptual understanding and practical skills in reinforcement learning as a framework for sequential decision making. It explores how agents learn from experience, balance exploration and exploitation, and adapt their behaviour based on feedback from the environment. Students examine how reinforcement learning drives advances in areas such as game-playing artificial intelligence, robotics, and the alignment of large language models, where feedback-based learning shapes intelligent behaviour. The module equips students with the understanding and practical ability to design, train, and analyse adaptive systems that learn from interaction, while offering a broader perspective on how reinforcement learning provides insight into the mechanisms underlying intelligent behaviour. Learning activities combine explanatory material, visual demonstrations, guided exercises, and hands-on experimentation using reinforcement learning frameworks to build intuition and technical fluency.

Learning outcomes

On successful completion of the module students will have demonstrated the following learning outcomes relevant to the subject:

1. Apply the principles of reinforcement learning and explain how agents learn through interaction and feedback.
2. Apply reinforcement learning algorithms to sequential decision-making problems across simulated or real-world environments.
3. Design and implement learning agents that balance exploration and exploitation to improve performance over time.
4. Assess how reward structures, feedback signals, and environmental dynamics influence learning outcomes and agent behaviour.
5. Discuss how reinforcement learning contributes to a broader understanding of adaptive and intelligent systems in both artificial and natural contexts.

Skills outcomes

On successful completion of the module students will have demonstrated the following skills learning outcomes:

1. Apply critical thinking and structured problem-solving to design, implement, and evaluate adaptive learning algorithms in dynamic environments.
2. Demonstrate adaptability and self-directed learning by exploring and integrating new methodologies, tools, or paradigms independently.
3. Communicate complex technical concepts and experimental results effectively to both technical and non-technical audiences using clear documentation and visualisation.
4. Apply integrated problem-solving and systems thinking to design and evaluate adaptive learning systems.
5. Exercise reflective practice and iterative improvement, evaluating approaches, interpreting outcomes, and refining strategies over time.

Syllabus

Indicative content for this module includes:

- Foundations of reinforcement learning and sequential decision-making
- Core theoretical constructs: agents, environments, states, actions, rewards, and returns
- Markov decision processes, value functions, and the Bellman equation
- Model-free prediction and control through Monte Carlo and temporal-difference methods
- Multi-step learning and eligibility traces for improving sample efficiency
- Function approximation and generalisation using parametric and neural models
- Deep reinforcement learning methods for continuous and high-dimensional tasks
- Exploration-exploitation trade-offs, stability, and performance evaluation in learning agents

Teaching Methods

Delivery type	Number	Length hours	Student hours
Discussion forum	6	1	6
WEBINAR	6	1	6
Independent online learning hours			42
Private study hours			96
Total Contact hours			12
Total hours (100hr per 10 credits)			150

Opportunities for Formative Feedback

1. Webinar-Based Discussion and Q&A
2. Weekly Practical Exercises

Methods of Assessment

Coursework
Assessment type	Notes	% of formal assessment
Online Assessment	~20 questions about different scenarios	20
Coursework	Coursework Project - Technical Report	80
Total percentage (Assessment Coursework)		100

This module will be reassessed through a 100% individual assessment in the same format as Assessment 2 (coursework project). The reassessment will involve a practical project that requires students to apply and integrate the knowledge and skills developed across all learning outcomes.

Reading List

Check the module area in Minerva for your reading list

Last updated: 30/04/2026

Errors, omissions, failed links etc should be notified to the Catalogue Team