2024/25 Taught Postgraduate Module Catalogue

OMAT5102M Exploratory Data Analysis

15 Credits Class Size: 150

Module manager: Dr George Mbaeyi
Email: G.C.Mbaeyi@leeds.ac.uk

Taught: Semester 1 Jan to 28 Feb, 1 Jan to 28 Feb (adv year), 1 Jul to 31 Aug View Timetable

Year running 2024/25

Pre-requisite qualifications

Students are required to meet the programme entry requirements prior to studying the module.

Module replaces

N/A

This module is not approved as an Elective

Module summary

This course will introduce students to basic techniques, which can be used to perform a preliminary investigation of data sets. Exploring data involves visualising the variables and relationships to help determine outliers, identify trends, suggest suitable statistical models and inform future data gathering.

Objectives

This module gives students knowledge on how to explore and analyse data sets appropriate for differing data types. The module will provide students with opportunities to develop the skills for visualising and summarising data sets, and practise applying statistical analysis. As well as introducing students to archetypical methods, they will explore more novel approaches: such as kernel density estimation and principal component analysis.

Learning outcomes

On completion of this module students will be able to:

1. Use software to visualise data in different ways.
2. Calculate numerical data summaries.
3. Identify probability models for data.
4. Understand clustering in data and measures of distance.
5. Effectively visualise high dimensional data.
6. Understand through examples how visualisation of data can inform statistical model selection.

Skills outcomes

Skills developed in this module:

- Effective communication through visualisation of data.
- Interpreting data and making modelling decisions based on that interpretation.

Syllabus

1. Data types: Categorical, discrete, continuous. Data cleaning.
2. Graphical summary: Boxplots, Histogram, KDE.
3. Numerical summary: Location, variability, quantiles. Data manipulation.
4. Discrete distributions: Binomial, geometric, Poisson.
5. Continuous distributions: normal distribution, exponential, Uniform.
6. Bivariate data: Scatterplots, correlation. Linear regression.
7. Logistic regression and classification. PCA and dimension reduction.
8. Use a statistical software to import data and perform simple visualization, exploration and summary.

Teaching Methods

Delivery type Number Length hours Student hours
On-line Learning 1 1.5 1.5
On-line Learning 5 1 5
Discussion forum 6 2 12
Independent online learning hours 42
Private study hours 89.5
Total Contact hours 18.5
Total hours (100hr per 10 credits) 150

Opportunities for Formative Feedback

Students will have weekly formative assignments (e.g. quizzes, problem sheets or practical tasks) for each taught unit of the module and will be given model solutions with comments.

Methods of Assessment

Coursework
Assessment type Notes % of formal assessment
In-course Assessment Students will be tested predominantly using e-assessment methods or MCQs.  20
Assignment The assignment will require students to complete a written report which may feature components of R code, R outputs, calculations and critical analysis of results.    It is expected that the assignment will be completed in one week.  80
Total percentage (Assessment Coursework) 100

Resit assessment will be available via the Assignment when the module next runs. The Assignment covers all learning outcomes for the module.

Reading List

There is no reading list for this module

Last updated: 3/11/2025

Errors, omissions, failed links etc should be notified to the Catalogue Team