2024/25 Undergraduate Module Catalogue

NATS3200 Machine Learning Approaches to Scientific Data Analysis

10 Credits Class Size: 65

Module manager: Dr Stefan Auer
Email: s.auer@leeds.ac.uk

Taught: Semester 2 (Jan to Jun) View Timetable

Year running 2024/25

Pre-requisite qualifications

NATS2100 or equivalent scientific programming in Python module. Year 1 Mathematics modules in Natural Sciences, or NATS2380 or equivalent Mathematics.

Module replaces

None

This module is not approved as a discovery module

Module summary

Statistical machine learning is at the core of the modern world. Online advertising, automated vehicles, stock market trading, transport planning: each uses statistical models to learn from past data and make decisions about the future. Statistical machine learning is a way to rigorously identify patterns in data and to make quantitative predictions. It is how we translate data into knowledge. In this module the fundamental concepts of statistical machine learning are introduced and the student will learn to use several key statistical models widely employed in science and industry.

Objectives

To introduce basic techniques from statistical machine learning for classification and regression using Python.

Learning outcomes

1. Be able to explain the classification and regression problem;
2. Be able to assess the error of a fitted model and explain the fitting algorithm;
3. Understand the statistical foundations of different classification and regression methods;
4. Understand the importance of uncertainty and evaluate the uncertainty in simple model predictions;
5. Be able to perform classification and regression tasks using existing software packages;
6. Be able to carry out and justify a simple statistical model analysis of real world data.

Syllabus

- Introduction to classification and regression;
- Statistical decision theory, loss functions;
- Optimisation, gradient descent, local & global optima;
- Linear regression;
- Logistic regression;
- Tree models;
- Ensemble methods: e.g. Boosting, Random forests.

Teaching Methods

Delivery type Number Length hours Student hours
Workshop 11 2 22
Lectures 11 1 11
Private study hours 67
Total Contact hours 33
Total hours (100hr per 10 credits) 100

Private study

Learn course material, perform tasks, create and solve computational problems.

Students required to resit the module would be given a further attempt to complete the tasks over the summer. The problems in those tasksheets have no "standard" solution, so it is not a problem if they have to work on the same problems again.

Opportunities for Formative Feedback

The teaching sessions introduces the course material, and the students can ask questions throughout and after the lecture.

The workshop sessions will be based in computer clusters and involve guide to solutions to the project with a member of staff enabling feedback on the approach being taken and any technical issues.

Methods of Assessment

Coursework
Assessment type Notes % of formal assessment
In-course Assessment Assessed coursework 100
Total percentage (Assessment Coursework) 100

Typically there are 6 tasksheets. The first three will not contribute to the module mark, and model solutions will be provided. Tasksheet 1 will be handed out in week 1, Tasksheet 2 will be handed out in week 2, and Tasksheet 3 will be handed out in week 3. The remaining 3 tasksheets will count to the module mark, 1/3 each. Tasksheet 3 will be handed out in week 4, Tasksheet 5 will be handed out in week 6, and Tasksheet 6 will be handed out in week 8. The assessments are weighted equally and a simple average is used to work out the module mark. In general, the tasksheets cover the material taught in the corresponding weeks.

Reading List

There is no reading list for this module

Last updated: 4/29/2024

Errors, omissions, failed links etc should be notified to the Catalogue Team