This is an old revision of the document!
Table of Contents
Task 1.b - Abstract/Concrete Nouns Discrimination
Introduction
The contrast between abstract and concrete words plays a central role in human cognition. Actually, behavioural and neuropsychological evidence suggests that abstract and concrete concepts might be represented, retrieved and processed differently in the human brain.
Since semantic classifications of abstract nouns have a higher degree of arbitariness than the ones for concrete nouns, we have not defined any a priori "ontology" of classes for the abstract domain. Instead, we will test computational models for their ability to discriminate between abstract and concrete nouns.
The data set consists of 40 nouns extracted from the MRC Psycholinguistic Database, with rates by human subjects on the concreteness scale.
Task Operationalization
The nouns have been classified into three classes:
- HI - 15 nouns selected from those in MRC with the highest concreteness value. These are a subset of the nouns in the data set for the concrete noun categorization task;
- LO - 15 nouns selected from those in MRC with the lowest concreteness value (e.g. "hope");
- ME - 10 nouns selected from those in MRC whose concreteness socre is close to the average (e.g. "pollution", "fight").
We operationalize the abstract/concrete noun discrimination as a 2-way clustering task of the subset of 30 nouns belogingin to the HI and LO classes in the data set.
To abstract away from differences stemming from any specific clustering method, you are asked to run your experiments with the k-means algorithm available in CLUTO. In case you can not run CLUTO on your system, the workshop organizers will carry out the clustering for you. In this case, data should be prepared according to a format that will be specified later on. Participants are also invited to experiment with other clustering methods and to compare the results with those obtained withCLUTO.
Evaluation will be carried in two stages:
1. coarse-grained evaluation - results will be evaluated with respect to the two measures for cluster quality available in CLUTO: purity and entropy (cf. Zhao, Y. and G. Karypis (2002), "Evaluation of Hierarchical Clustering Algorithms for Document Datasets", in CIKM 2002).
2. fine-grained evaluation - verb semantic classification is notoriously hard. Any a priori classification scheme runs the risk of being defied by the highly polysemous and multidimensional character of verbs. In this second stage, evaluation will therefore focus on specific verbs selected as "hard cases", because they are "excentric" members of a given class or they can be classified in more than one classes. Participants will be asked to perform a fine-grained error analysis on such verbs. Details about this type of evaluation will be provided later on.
Back to Start