2026/27 Taught Postgraduate Module Catalogue

MODL5700M Corpora, Data Science and AI

15 Credits Class Size: 30

Module manager: Serge Sharoff
Email: s.sharoff@leeds.ac.uk

Taught: Semester 2 (Jan to Jun) View Timetable

Year running 2026/27

Module replaces

MODL5007M

This module is not approved as an Elective

Module summary

This module is aimed at studying how language is used from the perspective of data science. The basis for the study is provided by corpora, i.e. large databanks of texts in natural languages. Corpora are also commonly used to train Generative AI and Machine Translation. The aim of the module is to make students familiar with statistical methods for corpus exploration and to equip them with AI literacy skills. Please note this is an optional module and runs subject to enrolments. If a low number of students choose this module, then the module may not run and you may be asked to choose another module.

Objectives

The overall goal of the module is to introduce data science for language with the specific focus on multilinguality. The more specific aims are: - to introduce the basic concepts and methods of data science and how they can be applied to language and translation; - to provide practical skills and tools for querying corpora and statistical interpretation of the results; - to make students better equipped with the background required for interaction with Generative AI and Machine Translation tools.

Learning outcomes


On successful completion of the module students will be able to:
LO1 describe basic types of corpora
LO2 Appraise and apply principles of corpus querying
LO3 Apply relevant statistical methods
LO4 Design Python scripts to collect and process their own specialised corpora.

On successful completion of the module students will have demonstrated the following skills learning outcomes:
SO1. Develop reflection and critical thinking to understand the nature of language use from a statistical point of view.
SO2. Develop skills in identifying and solving potential challenges by searching for information

Teaching Methods

Delivery type Number Length hours Student hours
Supervision 4 1 4
Lecture 8 1 8
Seminar 8 1 8
Private study hours 130
Total Contact hours 20
Total hours (100hr per 10 credits) 150

Opportunities for Formative Feedback

Regular weekly feedback on progress with the case study.

Methods of Assessment

Coursework
Assessment type Notes % of formal assessment
Coursework Case Study 100
Total percentage (Assessment Coursework) 100

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated

Reading List

Check the module area in Minerva for your reading list

Last updated: 30/04/2026

Errors, omissions, failed links etc should be notified to the Catalogue Team