Differences

This shows you the differences between two versions of the page.

Link to this comparison view

data:esslli2008:concrete_nouns_categorization [2010/11/01 14:07] (current)
Line 1: Line 1:
 +====== Task 1.a - Concrete Noun Categorization ======
 +
 +
 +==== Introduction ====
 +
 +The goal of the sub-task is to group concrete nouns into semantic categories.
 +
 +The {{concnouns.categorization.dataset.txt.gz |data set}} consists of 44 concrete nouns, belonging to 6 semantic categories (four animates and two inanimates). The nouns are included in the feature norms described in McRae et al. (2005) (cf. [[comparison_with_speaker-generated_features|Task3]]).
 +
 +
 +==== Task Operationalization ====
 +
 +We operationalize concrete noun categorization as a clustering task. Since the data set is organized hierarchically,​
 +we will run three clustering experiments,​ varying the number of classes and consequently their level of generality:
 +
 +  * **6-way clustering** - models will be tested on their ability to categorize the nouns into the most fine-grained classes of the dataset: //bird// ("​peacock"​),​ //​groundAnimal//​ ("​lion"​),​ //​fruitTree//​ ("​cherry"​),​ //green// ("​potato"​),​ //tool// ("​hammer"​),​ //vehicle// ("​car"​);​
 +  * **3-way clustering** - models will be tested on their ability to categorize the nouns into 3 classes: //animal// (superordinate of //bird// and //​groundAnimal//​),​ //​vegetable//​ (superordinate of //​fruitTree//​ and //green//), and //​artifact//​ (superordinate of //tool// and //​vehicle//​);​
 +
 +  * **2-way clustering** - models will be tested on their ability to categorize the nouns into the two top classes: //natural// (superordinate of //animal// and //​vegetable//​) and //​artifact//​ (superordinate of //tool// and //​vehicle//​)
 +
 +To abstract away from differences stemming from any specific clustering method, you are asked to run your experiments with the //k-means// algorithm available in [[http://​glaros.dtc.umn.edu/​gkhome/​cluto/​cluto/​overview|CLUTO]]. In case you can not run  [[http://​glaros.dtc.umn.edu/​gkhome/​cluto/​cluto/​overview|CLUTO]] on your system, the workshop organizers will carry out the clustering for you. In this case, data should be prepared according to a format that will be specified later on. Participants are also invited to experiment with other clustering methods and to compare the results with those obtained with[[http://​glaros.dtc.umn.edu/​gkhome/​cluto/​cluto/​overview|CLUTO]].
 +
 +
 +==== Task Evaluation ====
 +
 +Evaluation will be carried in two stages:
 +
 +1. **quantitative evaluation** - results will be evaluated with respect to the two measures for cluster quality available in CLUTO: //purity// and //entropy// (cf. Zhao, Y. and G. Karypis (2002), "​Evaluation of Hierarchical Clustering Algorithms for Document Datasets",​ in //CIKM 2002//​). ​
 +
 +2. **qualitative evaluation** - participants will be asked to perform a fine-grained error analysis, focussing on critical nouns, hard classes, etc. Details about this type of evaluation will be provided later on.
 +
 +
 +Back to [[Start]]