This is an old revision of the document!


Distributional Semantic Models (NAACL-HLT 2010)

Distributional Semantic Models NAACL-HLT 2010 (Los Angeles)
Tutorial at the NAACL-HLT 2010 Conference, Los Angeles, 1 June 2010

Tutorial description

Distributional semantic models (DSM) – also known as "word space" or "distributional similarity" models – are based on the assumption that the meaning of a word can (at least to a certain extent) be inferred from its usage, i.e. its distribution in text. Therefore, these models build high-dimensional vector representations through a statistical analysis of the contexts in which words occur.

Since the seminal papers of Landauer & Dumais (1997) and Schütze (1998), DSMs have been an active area of research in computational linguistics. Amongst many other tasks, they have been applied to solving the TOEFL synonym test, automatic thesaurus construction, identification of translation equivalents, word sense induction and discrimination, POS induction, identification of analogical relations, PP attachment disambiguation, semantic classification, as well as the prediction of fMRI and EEG data (see bibliography). Recent years have seen renewed and rapidly growing interest in distributional approaches, as shown by the series of workshops on DSM held at Context 2007, ESSLLI 2008, EACL 2009, CogSci 2009, NAACL-HLT 2010, ACL 2010 and ESSLLI 2010 (links).

This tutorial is targeted both at participants who are new to the field and need a comprehensive overview of DSM techniques and applications, and at experienced scientists who want to get up to speed on current directions in DSM research. Its main goals are to

  • introduce the most common DSM architectures and their parameters, as well as prototypical applications;
  • equip participants with the mathematical techniques needed for the implementation of DSMs, in particular those of matrix algebra;
  • illustrate visualisation techniques and mathematical arguments that help in understanding the high-dimensional DSM vector spaces and making sense of key operations such as SVD dimensionality reduction; and
  • provide an overview of current research on DSMs, available software, evaluation tasks and future trends.

An implementation of all methods presented in the tutorial will be made available on this Web site, based on the open-source statistical programming language R. With its sophisticated visualisation and data analysis features and an enormous choice of add-on packages, R provides an excellent "toy laboratory" for DSM research and is even powerful enough for mid-sized applications.

Schedule

  1. Introduction
    • motivation and brief history of distributional semantics
    • common DSM architectures
    • prototypical applications
    • concrete examples used in the tutorial
  2. Taxonomy of DSM parameters including
    • size and type of context window
    • feature scaling (tf.idf, statistical association measures, …)
    • normalisation and standardisation of rows and/or columns
    • distance/similarity measures: Euclidean, Minkowski p-norms, cosine, entropy-based, …
    • dimensionality reduction: feature selection, SVD, random indexing (RI)
  3. Elements of matrix algebra for DSM
    • basic matrix and vector operations
    • norms and distances, angles, orthogonality
    • projection and dimensionality reduction
  4. Making sense of DSMs: mathematical analysis and visualisation techniques
    • nearest neighbours and clustering
    • semantic maps: PCA, MDS, SOM
    • visualisation of high-dimensional spaces
    • supervised classification based on DSM vectors
    • understanding dimensionality reduction with SVD and RI
    • term-term vs. term-context matrix, connection to first-order association
    • SVD as a latent class model
  5. Current research topics and future directions
    • overview of current research on DSMs
    • evaluation tasks and data sets
    • available "off-the-shelf" DSM software
    • limitations and key problems of DSMs
    • trends for future work

Each of the five parts will be compressed into a slot of roughly 30 minutes.

Contact

This tutorial will be taught by Stefan Evert (University of Osnabrück, Germany). Don't hesitate to contact me at stefan.evert@uos.de if you have any questions.