Differences

This shows you the differences between two versions of the page.

--- data:comparison_with_speaker-generated_features [2008/01/04 14:56]
marco
+++ — (current)
@@ Line 1: / Line 1: @@
-====== Task 3: Comparison with Speaker-Generated Properties ======
-==== Introduction ====
-From a cognitive point of view, there is little doubt that salient properties of a concept are an important part of its "meaning", and subjects show a remarkable degree of agreement in tasks that require enumerating the typical properties of a concept: a dog //barks, has a tail, is a pet, etc.//
-Psychologists have been collecting "feature norms", i.e., speaker-generated lists of concepts described in terms of properties, for decades now.
-A particularly large and well-articulated list was recently made publicly available by McRae and colleagues:
-McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavioral Research Methods, Instruments, and Computers, 37, 547-559.
-The list can be obtained as described [[http://www.creelab.org/downloads|here]].
-==== Task Operationalization ====
-We operationalize the property generation task as follows.
-We focus on the same set of 44 concepts used in the [[http://wordspace.collocations.de/doku.php/data:concrete_nouns_categorization|concrete noun categorization task]].
-For each target concept, we pick the top 10 properties from the McRae norms (ranked
-by number of subjects that produced them) and use them as the gold
-standard set for that concept. Given the ranked output of a model, we
-compute precision for each concept with respect to this gold standard,
-at various n-best thresholds, and we average precision across the 44
-concepts. We limit ourselves to the top 10 human-generated properties
-of each concept since, for about 10% of the target concepts, the norms only
-contain 10 properties (for one concept, //snail//, the norms list 9 properties).
-The provided evaluation script, by default, reports average precision at the 10-, 20- and 30-best thresholds.
-=== Property Expansion ===
-The properties in the norms database are expressed by phrases such as //tastes sweet//
-or //is loud//, resulting from manual normalization of the
-subjects' responses (McRae et al. 2005, p. 551). Thus, we face
-two problems when determining whether a property generated by a model
-matches a property in the norms: First, all word space models we are aware of produce single orthographic //words// as properties, and these have to be matched against the
-//phrases// in the norms. Second, we need to undo the normalization of McRae and colleagues, so that, say, //loud//, //noise// and //noisy// will all be counted as
-matches against property //is loud//.
-We dealt with these issues by generating an "expansion set" for each
-of the top 10 properties of each of the 44 target concepts, i.e., a
-list of single word expressions that seemed plausible ways to express
-the relevant property. The expansion set was prepared by first
-extracting from WordNet the synonyms of the words that constituted the
-last element of a property phrase (//red// in //is red//), and
-then filtering out irrelevant synonyms by hand, and adding other forms -- including inflectional  and derivational ((//leg// for
-//legs// and //transport// for
-//transportation//, respectively) variants and other semantic neighbours or
-closely related entities (//lives on water// was expanded to
-//aquatic, lake, ocean, river, sea, water//).
-We were rather generous in determining what counts as an expansion, since we expect
-from preliminary experimentation that we should cut some slack to the
-models in order to increase recall. Crucially, while we recognize
-the somewhat subjective nature of the expansion operation, this was
-conducted //before// looking at the properties generated by the
-models, and we have no reason to think that matching against the
-expanded set introduces a bias in favour or against any specific
-model.
-When evaluating against the expansion set, there is the possibility that
-a model will match a property more than once (e.g., matching both
-\emph{transport} and \emph{transportation}). In these cases, we count the top
-match, and we ignore the lower ones (i.e., lower matches are not treated as
-hits, but they do not contribute to the n-best count either).
-Back to [[data:comparison_with_speaker-generated_features|Top]]
-==== Gold standard and evaluation script ====
-This {{data:propgen.tar.gz|archive}} contains the gold standard (with property expansions as described above) and an evaluation script that computes average precision at various n-best lists.
-Detailed information on the script can be provided by running it with -h option:
-''evaluate-against-expanded-props.pl -h | more''
-However, in short, if you can organize the output of you model in a file, say ''output.txt'' in format:
-''concept property score''
-then you can run the evaluation (against the gold standard set with expansions generated as described above) as:
-''evaluate-against-expanded-props.pl expanded-props.txt output.txt''
-We provide this script to have a common benchmark when comparing models, but we also encourage you to explore the McRae et al.'s database for other possible ways to evaluate the models.
-Back to [[data:comparison_with_speaker-generated_features|Top]]
-Back to [[Start]]

You are here: start » data » comparison_with_speaker-generated_features

Differences

Navigation

Search

Toolbox