This is an old revision of the document!


Task 2b - Abstract/Concrete Noun Discrimination

Introduction

The contrast between abstract and concrete words plays a central role in human cognition. Behavioural and neuropsychological evidence suggests that abstract and concrete concepts might be represented, retrieved and processed differently in the human brain (Noppeney, U. and C. Price (2004), "Retrieval of abstract semantics", Brain and Image, 22: 164-170)

Since semantic classifications of abstract nouns have a higher degree of arbitariness than the ones for concrete nouns, we have not defined any a priori "ontology" of classes for the abstract domain. Instead, we will test computational models for their ability to discriminate between abstract and concrete nouns.

The data set consists of 40 nouns extracted from the MRC Psycholinguistic Database, with ratings by human subjects on the concreteness scale.

Task Operationalization

The nouns have been classified into three classes:

  • HI - 15 nouns selected from those in MRC with the highest concreteness value. These are a subset of the nouns in the data set for the concrete noun categorization task;
  • LO - 15 nouns selected from those in MRC with the lowest concreteness value (e.g. "hope");
  • ME - 10 nouns with a concreteness score close to the average in MRC (e.g. "pollution", "fight", etc.).

We operationalize the abstract/concrete noun discrimination as a 2-way clustering task of the subset of 30 nouns belogingin to the HI and LO classes in the data set.

To abstract away from differences stemming from any specific clustering method, you are asked to run your experiments with CLUTO. See the page on the concrete noun categorization task for details.

Task Evaluation

Evaluation will be carried out in two stages:

1. HI vs. LO discrimination - results of 2-way clustering will be evaluated with respect to the two measures for cluster quality available in CLUTO: purity and entropy (cf. Zhao, Y. and G. Karypis (2002), "Evaluation of Hierarchical Clustering Algorithms for Document Datasets", in CIKM 2002).

2. ME evaluation - This second phase will focus on the 10 nouns belonging to the ME class, which includes nouns referring to events, institutions, and other entities with an intermediate concreteness value according to subjects' judgments. We will evaluate how systems deal with these nouns with respect to the clusters identified for the HI and LO items. ( recommended qualitative evaluation criteria)