2026/27 Undergraduate Module Catalogue

LING2065 Data Science for Linguists

20 Credits Class Size: 18

Module manager: Cecile De Cat
Email: c.decat@leeds.ac.uk

Taught: Semester 1 (Sep to Jan) View Timetable

Year running 2026/27

Pre-requisites

MODL1060 Language: Structure and Sound

This module is approved as a discovery module

Module summary

Linguistic science relies on real-world data: patterns in everyday conversations, results from experiments testing how we process or learn language, recordings of speech, survey responses, and much more. This module will teach you how to handle, visualise, and analyse such data. It will help you develop the skills you need to carry out a quantitative study, and to better understand the scientific literature. It will equip you with the transferrable skills that are essential in these days of Big Data. The lectures will introduce you to descriptive statistics, based on clear and intuitive visualisations. We will explore relationships inside datasets and identify patterns in the data. We will consider different sources of bias and solutions to mitigate against it. You will also learn about hypothesis testing, and how to interpret the results of basic statistical analyses. In the practicals, you will learn to clean and transform data to produce plots and carry out some basic analyses. To do this, you will learn to code in R (a free and powerful tool used by data scientists worldwide), aided by demonstrations and examples. Please note this is an optional module and runs subject to enrolments. If a low number of students choose this module, then the module may not run and you may be asked to choose another module.

Objectives

In this module students will learn:

- to work with electronic data, using R (a widely used tool, supported by a large community of data scientists across the world).
- to understand the different types of data one can encounter in linguistic research. This will include understanding the different types of variables that are used in statistics.
- to prepare data to make it suitable for analysis.
- to describe a dataset so it is properly documented.
- to visualise data and understand how it is distributed.
- to visualise and interpret different types of relationship between variables.
- to estimate whether a difference observed visually is statistically significant.

In classes, you will be guided through demonstrations and practical exercises based on linguistic datasets (e.g., experimental data, corpus data, questionnaire data). The demonstrations and exercises will be done in R.

Learning outcomes

On successful completion of the module students will be able to:

LO.1. describe linguistic datasets and visualise patterns in the data
LO.2. identify how the available data can be used to answer a specific research question, and provide visualisations to inform a preliminary analysis;
LO.3. Explain and interpret the basics of regression analysis. This includes being able to explain in simple terms how it works and when it is useful, and to be able to interpret the results of a given model, in relation to a research question in linguistics.

Skills Learning Outcomes

On successful completion of the module students will be able to:

SO1. use R to prepare and visualise data
SO2. interpret the relationship between variables in a dataset.
SO3. interpret the results of a simple statistical analysis.

Teaching Methods

Delivery type Number Length hours Student hours
Lecture 10 1 10
Practical 10 1 10
Independent online learning hours 20
Private study hours 160
Total Contact hours 20
Total hours (100hr per 10 credits) 200

Opportunities for Formative Feedback

As this module aims to develop practical skills, it will be heavily based on demonstrations and hands-on activities. Students will learn by doing, and will be expected to be proactive in the monitoring of their understanding. Independent self-study will be essential, as knowledge and skills will be acquired incrementally. In that spirit, judicious use of AI will be encouraged. The lectures and practicals will be interactive, and will aim to provide ample opportunities for feedback and support.

- Interactive tools will be used during lectures to monitor student understanding.
- Exercises will be set weekly in preparation for the practicals. Feedback and advice will be provided during the practicals.
- There will be two opportunities for formative assessment (one before each summative assessment)

Methods of Assessment

Coursework
Assessment type Notes % of formal assessment
Coursework Presentation 50
Total percentage (Assessment Coursework) 50

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated

Exams
Exam type Exam duration % of formal assessment
Unseen Practical exam (Semester 1) 2.0 Hrs Mins 50
Total percentage (Assessment Exams) 50

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated

Reading List

Check the module area in Minerva for your reading list

Last updated: 30/04/2026

Errors, omissions, failed links etc should be notified to the Catalogue Team