2026/27 Taught Postgraduate Module Catalogue

OCOM5250M Deep Learning for Computer Vision

15 Credits Class Size: 50

Module manager: Dr Arash Rabbani
Email: A.Rabbani@leeds.ac.uk

Taught: 1 Mar to 30 Apr, 1 Mar to 30 Apr (2mth)(adv yr), 1 Sep to 31 Oct, 1 Sep to 31 Oct (adv yr) View Timetable

Year running 2026/27

Pre-requisite qualifications

None

Module replaces

N/A

This module is not approved as an Elective

Module summary

This module explores deep learning techniques for computer vision, focusing on how neural networks interpret, represent, and generate visual information. It examines architectures and algorithms that enable tasks such as image classification, object detection, segmentation, and visual synthesis. Students learn how advances in convolutional and transformer-based models underpin modern vision systems and gain practical experience in developing, training, and evaluating models that extract meaningful structure and semantics from visual data.

Objectives

This module aims to develop a deep understanding of how modern deep learning methods enable machines to perceive and reason about visual information. It builds on core principles of representation learning to explore architectures and techniques that power computer vision systems, including convolutional, transformer-based, generative, and multimodal models that integrate visual data with other information sources. Learning activities combine conceptual explanation, visual demonstrations, guided experiments, and practical implementation exercises to bridge theory and practice. Through progressive exploration of image classification, detection, segmentation, and synthesis, students gain the knowledge and skills to design, train, and critically evaluate vision models for diverse applications, while developing an appreciation of how architectural choices influence visual representation and performance.

Learning outcomes

On successful completion of the module students will have demonstrated the following learning outcomes relevant to the subject:

1. Explain the principles of deep learning architectures used in computer vision, including convolutional, transformer-based, and generative models.
2. Apply deep learning methods to core computer vision tasks such as image classification, object detection, segmentation, and visual synthesis.
3. Design and implement vision models, selecting suitable architectures, training strategies, and evaluation metrics for different visual applications.
4. Assess how data characteristics, inductive biases, and architectural choices influence the performance and generalisation of vision models.
5. Experiment with multimodal approaches that combine visual data with other modalities to enhance representation and understanding in complex tasks.

Skills outcomes

On successful completion of the module students will have demonstrated the following skills learning outcomes:

1. Apply analytical and structured problem-solving skills to design, implement, and evaluate computer vision solutions for complex datasets.
2. Demonstrate adaptability and self-directed learning by integrating new tools, techniques, or frameworks to address evolving challenges in computer vision.
3. Communicate technical concepts, workflows, and results effectively to both technical and non-technical audiences using clear documentation and visualisation.
4. Apply integrated problem-solving and systems thinking to design and optimise computer vision solutions.
5. Exercise reflective practice and critical evaluation to assess methods, optimise processes, and continuously improve project outcomes.

Syllabus

Indicative content for this module includes:

- Fundamentals of deep learning for computer vision and visual representation learning
- Convolutional neural networks and architectural variants for image classification and feature extraction
- Object detection, localisation, and semantic/instance segmentation methods
- Transformer-based models and attention mechanisms for visual understanding
- Generative and reconstruction-based approaches, including autoencoders, GANs, and diffusion models for image synthesis
- Multimodal vision-language models such as CLIP and related architectures for cross-modal representation learning

Teaching Methods

Delivery type	Number	Length hours	Student hours
Discussion forum	6	1	6
WEBINAR	6	1	6
Independent online learning hours			42
Private study hours			96
Total Contact hours			12
Total hours (100hr per 10 credits)			150

Opportunities for Formative Feedback

1. Webinar-Based Discussion and Q&A
2. Weekly Practical Exercises

Methods of Assessment

Coursework
Assessment type	Notes	% of formal assessment
Online Assessment	~20 questions about different scenarios	20
Coursework	Coursework Project - Technical Report	80
Total percentage (Assessment Coursework)		100

This module will be reassessed through a 100% individual assessment in the same format as Assessment 2 (coursework project). The reassessment will involve a practical project that requires students to apply and integrate the knowledge and skills developed across all learning outcomes.

Reading List

Check the module area in Minerva for your reading list

Last updated: 30/04/2026

Errors, omissions, failed links etc should be notified to the Catalogue Team