Table of Contents

Hands-on Distributional Semantics (ESSLLI 2021 / 2022)

Hands-on Distributional Semantics – From first steps to interdisciplinary applications ESSLLI 2022 (Galway) ESSLLI 2021 (online)
Foundational course at ESSLLI 2021, online, August 9–13, 2021

Hands-on Distributional Semantics for Linguistics using R
Foundational course at ESSLLI 2022, Galway, Ireland, August 8–12, 2022

Course description

Distributional semantic models (DSM) – also known as “word space”, “distributional similarity”, or more recently “word embeddings” – are based on the assumption that the meaning of a word can (at least to a certain extent) be inferred from its usage, i.e. its distribution in text. Therefore, these models dynamically build semantic representations – in the form of high-dimensional vector spaces – through a statistical analysis of the contexts in which words occur. DSMs are a promising technique for solving the lexical acquisition bottleneck by unsupervised learning, and their distributed representation provides a cognitively plausible, robust and flexible architecture for the organisation and processing of semantic information.

In this introductory course we will highlight the interdisciplinary potential of DSM beyond standard semantic similarity tasks, with applications in cognitive modeling and theoretical linguistics. This course aims to equip participants with the background knowledge and skills needed to build different kinds of DSM representations – from traditional “count” models to neural word embeddings – and apply them to a wide range of tasks. The hands-on sessions will be conducted in R with the user-friendly wordspace package and various pre-built models.

Lecturers: Stephanie Evert (FAU Erlangen-Nürnberg) & Gabriella Lapesa (IMS, U Stuttgart)

Organizational information

Please make sure you have up-to-date versions of R and RStudio to participate in the hands-on exercises. Follow the detailed set-up instructions and download (some of) the data sets and precompiled DSMs. Additional instructions will be given in the first session on Monday. In particular, you will be asked to download and install the wordspaceEval package using a password provided in the course.

Schedule & handouts

Day 1: Introduction

presentation slides (PDF, 2.1 MB) – handout (PDF, 1.7 MB) – R code: hands_on_day1.R

Day 2: Building a DSM

presentation slides (PDF, 1.6 MB) – handout (PDF, 1.2 MB) – R code: hands_on_day2.R – bonus material: hands_on_day2_input_formats.R

Day 3: Which aspects of meaning does a DSM capture?

presentation slides (PDF, 3.2 MB) – handout (PDF, 2.9 MB) – R code: hands_on_day3_exercise_1.R, hands_on_day3_exercise_2.R

Day 4: DS beyond NLP – Linguistic theory

presentation slides (PDF, 3.6 MB) – handout (PDF, 3.5 MB) – R code: hands_on_day4.R – bonus material: schuetze1998.R

Day 5: DS beyond NLP – Cognitive modelling

presentation slides (PDF, 1.6 MB) – handout (PDF, 1.4 MB) – R code: hands_on_day5.R – bonus task: CogALex4.rda (0.2 MB) – bonus material: hands_on_day5_matrix_factorization.R