Differences

This shows you the differences between two versions of the page.

Link to this comparison view

data:esslli2008:concrete_noun_categorization [2010/02/08 00:27]
schtepf
data:esslli2008:concrete_noun_categorization [2010/11/01 14:07]
Line 1: Line 1:
-====== Task 1a - Concrete Noun Categorization ====== 
- 
- 
- 
-==== Introduction ==== 
- 
-The goal of the sub-task is to group concrete nouns into semantic categories. 
- 
-The {{concnouns.categorization.dataset.txt.gz |data set}} consists of 44 concrete nouns, belonging to 6 semantic categories (four natural and two man-made). The nouns are included in the feature norms described in McRae et al. (2005) (cf. [[comparison_with_speaker-generated_features|Task 3]]). 
- 
- 
- 
- 
- 
-==== Task Operationalization ==== 
- 
-We operationalize concrete nouns categorization as a clustering task. Since the data set is organized hierarchically, 
-we will run three clustering experiments, varying the number of classes and consequently their level of generality: 
- 
-  * **6-way clustering** - models will be tested on their ability to categorize the nouns into the most fine-grained classes of the dataset: //bird// ("peacock"), //groundAnimal// ("lion"), //fruitTree// ("cherry"), //green// ("potato"), //tool// ("hammer"), //vehicle// ("car"); 
-  * **3-way clustering** - models will be tested on their ability to categorize the nouns into 3 classes supported by robust neuro-cognitive evidence (see, e.g., Caramazza, 2000, "The Organization of Conceptual Knowledge in the Brain", in Gazzaniga (ed.):  //The New Cognitive Neurosciences//): //animal// (superordinate of //bird// and //groundAnimal//), //vegetable// (superordinate of //fruitTree// and //green//), //artifact// (superordinate of //tool// and //vehicle//); 
- 
-  * **2-way clustering** - models will be tested on their ability to categorize the nouns into the two top classes: //natural// (superordinate of //animal// and //vegetable//) and //artifact// (superordinate of //tool// and //vehicle//) 
- 
-To abstract away from differences stemming from any specific clustering method, you are asked to run your experiments with the //Repeated Bisections// clustering algorithm in [[http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview|CLUTO]] (//rbr// value of the //-clmethod// option). In case you can not run CLUTO on your system, the workshop organizers will carry out the clustering for you. In this case, data should be prepared according to a format that will be specified later on. Participants are also invited to experiment with other clustering methods and to compare the results with those obtained with CLUTO. 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
-==== Task Evaluation ==== 
- 
-Evaluation will be carried out in two stages: 
- 
-1. **quantitative evaluation** - results will be evaluated with respect to the two measures for cluster quality available in CLUTO: //purity// and //entropy// (cf. Zhao, Y. and G. Karypis (2002), "Evaluation of Hierarchical Clustering Algorithms for Document Datasets", in //CIKM 2002//).  
- 
-2. **qualitative evaluation** - participants will be asked to perform a fine-grained error analysis, focussing on critical nouns, hard classes, etc. ({{qualitativeanalysis.nouncat.zip| recommended qualitative evaluation criteria}}) 
-