Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
data:comparison_with_speaker-generated_features [2008/01/04 14:56] marco |
— (current) | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Task 3: Comparison with Speaker-Generated Properties ====== | ||
| - | |||
| - | |||
| - | |||
| - | ==== Introduction ==== | ||
| - | |||
| - | From a cognitive point of view, there is little doubt that salient properties of a concept are an important part of its " | ||
| - | |||
| - | Psychologists have been collecting " | ||
| - | |||
| - | A particularly large and well-articulated list was recently made publicly available by McRae and colleagues: | ||
| - | |||
| - | McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavioral Research Methods, Instruments, | ||
| - | |||
| - | The list can be obtained as described [[http:// | ||
| - | |||
| - | |||
| - | |||
| - | |||
| - | |||
| - | ==== Task Operationalization ==== | ||
| - | |||
| - | |||
| - | We operationalize the property generation task as follows. | ||
| - | |||
| - | We focus on the same set of 44 concepts used in the [[http:// | ||
| - | |||
| - | For each target concept, we pick the top 10 properties from the McRae norms (ranked | ||
| - | by number of subjects that produced them) and use them as the gold | ||
| - | standard set for that concept. Given the ranked output of a model, we | ||
| - | compute precision for each concept with respect to this gold standard, | ||
| - | at various n-best thresholds, and we average precision across the 44 | ||
| - | concepts. We limit ourselves to the top 10 human-generated properties | ||
| - | of each concept since, for about 10% of the target concepts, the norms only | ||
| - | contain 10 properties (for one concept, //snail//, the norms list 9 properties). | ||
| - | |||
| - | The provided evaluation script, by default, reports average precision at the 10-, 20- and 30-best thresholds. | ||
| - | |||
| - | === Property Expansion === | ||
| - | |||
| - | The properties in the norms database are expressed by phrases such as //tastes sweet// | ||
| - | or //is loud//, resulting from manual normalization of the | ||
| - | subjects' | ||
| - | two problems when determining whether a property generated by a model | ||
| - | matches a property in the norms: First, all word space models we are aware of produce single orthographic //words// as properties, and these have to be matched against the | ||
| - | //phrases// in the norms. Second, we need to undo the normalization of McRae and colleagues, so that, say, //loud//, //noise// and //noisy// will all be counted as | ||
| - | matches against property //is loud//. | ||
| - | |||
| - | We dealt with these issues by generating an " | ||
| - | of the top 10 properties of each of the 44 target concepts, i.e., a | ||
| - | list of single word expressions that seemed plausible ways to express | ||
| - | the relevant property. The expansion set was prepared by first | ||
| - | extracting from WordNet the synonyms of the words that constituted the | ||
| - | last element of a property phrase (//red// in //is red//), and | ||
| - | then filtering out irrelevant synonyms by hand, and adding other forms -- including inflectional | ||
| - | //legs// and // | ||
| - | // | ||
| - | closely related entities (//lives on water// was expanded to | ||
| - | //aquatic, lake, ocean, river, sea, water//). | ||
| - | |||
| - | We were rather generous in determining what counts as an expansion, since we expect | ||
| - | from preliminary experimentation that we should cut some slack to the | ||
| - | models in order to increase recall. Crucially, while we recognize | ||
| - | the somewhat subjective nature of the expansion operation, this was | ||
| - | conducted //before// looking at the properties generated by the | ||
| - | models, and we have no reason to think that matching against the | ||
| - | expanded set introduces a bias in favour or against any specific | ||
| - | model. | ||
| - | |||
| - | When evaluating against the expansion set, there is the possibility that | ||
| - | a model will match a property more than once (e.g., matching both | ||
| - | \emph{transport} and \emph{transportation}). In these cases, we count the top | ||
| - | match, and we ignore the lower ones (i.e., lower matches are not treated as | ||
| - | hits, but they do not contribute to the n-best count either). | ||
| - | |||
| - | |||
| - | Back to [[data: | ||
| - | |||
| - | |||
| - | ==== Gold standard and evaluation script ==== | ||
| - | |||
| - | This {{data: | ||
| - | |||
| - | Detailed information on the script can be provided by running it with -h option: | ||
| - | |||
| - | '' | ||
| - | |||
| - | However, in short, if you can organize the output of you model in a file, say '' | ||
| - | |||
| - | '' | ||
| - | |||
| - | then you can run the evaluation (against the gold standard set with expansions generated as described above) as: | ||
| - | |||
| - | '' | ||
| - | |||
| - | We provide this script to have a common benchmark when comparing models, but we also encourage you to explore the McRae et al.'s database for other possible ways to evaluate the models. | ||
| - | |||
| - | |||
| - | Back to [[data: | ||
| - | |||
| - | Back to [[Start]] | ||