Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
data:correlation_with_free_association_norms [2008/01/23 17:37]
schtepf
— (current)
Line 1: Line 1:
-====== Correlation of statistical distribution and human free associations ====== 
  
- 
-===== Overview and main goals ===== 
- 
-In psychology, **free associations** are the first words that come to the mind of a native speaker when he or she is presented with a stimulus word, presumably retrieved from associative memory.  It is tempting to make a connection between such free associations and the statistical association patterns of words in the linguistic experience of speakers, including both first-order associations (**collocations**) and higher-order associations (**distributional similarity**).  The misleading terminological resemblance between the two concepts is not the only reason, though: 
- 
-  * Neither free associations nor statistical association can be linked directly to a specific linguistic phenomenon (such as multiword expressions or a particular semantic relation) and are often considered //epiphenomena// in linguistic theory (which is based on categorial distinctions and symbolic models). 
-  * It is quite plausible to assume that associative memory reflects salient statistical association patterns in the experience of a person.  For free associations between words, the predominant factor should be //linguistic experience// (although associations between non-linguistic concepts will certainly play a role as well). 
- 
-In the shared task, we wish to find out to what extent free associations can be explained and predicted by statistical association measures computed from corpus data.  The scientific goals of this experiment are twofold: 
- 
-  - **Improve our understanding of free associations.**  In particular, we are interested in the interplay between **first-order and higher-order statistical associations** in human associative memory (e.g. //bear// evokes the hypernym //insect// and //brown//, but //mouse// evokes the compound //mouse trap//).  In future shared tasks, we will also attempt to model the **asymmetry** of many free associations (e.g. //bowler// strongly evokes //hat//, but not vice versa). 
-  - **Evaluate free associations as a straightforward "baseline" interpretation of distributional similarity.**  If word space proves to be a good **model of human associative memory**, then we should perhaps focus more on the relation between such free associations and theoretical linguistic categories rather than studying the linguistic aspects of word space models directly.  ((We fully expect a negative answer here, and this is certainly the desirable outcome for many researchers. However, it will be interesting to see how close the relation between word space and associative memory really is.)) 
- 
-In order to address these questions, we propose the three subtasks described below.  Note that ideally the same word space model should be used for all subtasks, although its similarity scores etc. will be interpreted in different ways, of course.  Participants are specifically encouraged to combine first-order statistical associations (see [[http://www.collocations.de/AM/|www.collocations.de/AM]]) with their word space model and to discuss the respective contribution made by each type of association. 
- 
-===== Data preparation ===== 
- 
- 
-==== Association norms ==== 
- 
-Psychologists measure free association with so-called **association norms**:  Native speakers are presented with stimulus words and are asked to write down the first word that comes to mind for each stimulus.  The degree of free association between a stimulus (//S//) and response (//R//) is then quantified by the percentage of test subjects who produced //R// when presented with //S// The data sets for this task are based on a large, freely available database of English association norms, the **Edinburgh Associative Thesaurus** ([[http://www.eat.rl.ac.uk/]]). 
-((We also considered using the **USF Free Association Database** ([[http://www.usf.edu/FreeAssociation]]), but it was not suitable for our purposes due to the exclusion of hapax responses.  More information on the USF database can be found in: Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998).  //The University of South Florida word association, rhyme, and word fragment norms.//)) 
- 
-  * Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973).  An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), //The Computer and Literary Studies//. Edinburgh: Edinburgh University Press. 
- 
- 
-==== Cleanup ==== 
- 
-Stimulus (**cue**) and response (**target**) words in the EAT database were normalised to lowercase, and multiword units (i.e. words containing blanks) were discarded.  Both cues and targets seem to be partly lemmatised base forms (**headwords**), partly inflected forms (mostly plurals), and no part-of-speech distinctions are made (so the entry //light// may refer to noun, adjective or verb and was probably interpreted and used in all three meanings by test subjects). Automatic normalisation of inflected forms or identification of parts of speech was not feasible, but we have made an effort to exlcude word pairs containing inflected forms from the data sets. 
- 
-In order to make sure that word space models have sufficient information for each word pair, only common English words were accepted as cues and targets. For operationalisation, //common words// were defined as headwords that occur in **at least 50 different documents** in the British National Corpus (BNC), XML Edition. This threshold was phrased in terms of document frequencies to avoid genre- and domain-specific words in the data sets, so that the choice of base corpus for the word space models should be less critical. 
- 
- 
- 
-===== Data sets & tasks ===== 
- 
-ZIP-archive with data sets for all subtasks: **coming soon** 
- 
-All files are TAB-delimited tables in ASCII text format with a single header row, so they can easily be loaded into [[http://www.r-project.org/|R]] (with ''read.delim()'') and most spreadsheet programs.  Standard columns are ''cue'' (stimulus headword) and ''target'' (response headword); the other columns are specific for each task and are described below. 
- 
-For each task, separate training and test sets are provided. Training sets are small and can be used to adapt parameters of the word space models or the formula used to predict free association strength from the statistical association data. //No development or tuning on the evaluation sets is allowed!// 
- 
- 
- 
- 
-==== 1. Discrimination ==== 
- 
-Files: ''FA/discrimination_train.tbl'' (3 x 20 pairs), ''FA/discrimination_test.tbl'' (3 x 100 pairs) 
- 
-Format: 
-  * ''cue'' = stimulus word 
-  * ''target'' = response word 
-  * ''type'' = ''FIRST'', ''HAPAX'', or ''RANDOM'' 
- 
-The task here is to discriminate between //strongly associated// and //non-associated// cue-target pairs, with a further subdivision of the second group into //plausible// and //random// pairs.  Training and test data were randomly sampled from three pools: 
-  * ''FIRST'': frequent first responses (given by more than 50% of test subjects) as //strongly associated// pairs 
-  * ''HAPAX'': cue-target pairs that were produced by a single test subject; there is obviously no substantial association, but the target must be a plausible response (at least under certain circumstances) 
-  * ''RANDOM'': random combinations of headwords from the EAT that were never produced as a cue-target pair (in any direction); most of these will likely be very implausible combinations 
- 
-The main goal of this task is discrimination between the ''FIRST'' category (//strongly associated// pairs) and the other two categories.  A further discrimination between ''HAPAX'' and ''RANDOM'' can be attempted, but is expected to be much more difficult. 
- 
-Evaluation should report classification accuracy on the test set after parameter tuning on the training set.  Note that the baseline accuracy for the main classification task is 66.6% (all pairs classified as non-associated).  Post-hoc analysis might consider the influence of different parameter settings and first-order/higher-order combinations on the test set. 
- 
- 
- 
- 
- 
-==== 2. Correlation ==== 
- 
-Files: ''FA/correlation_train.tbl'' (40 pairs), ''FA/correlation_test.tbl'' (240 pairs) 
- 
-Format: 
-  * ''cue'' = stimulus headword 
-  * ''target'' = response headword 
-  * ''assoc'' = (forward) association strength of pair = proportion of responses //target// for stimulus //cue// 
- 
-Here, the task is to predict free association strength for a given list of cue-target pairs, quantified by the proportion of test subjects that gave //target// as a response to the stimulus //cue// Association strength therefore ranges from 0 to 1 (the highest value in the EAT is .91).  Pairs in the training and test set have been selected by stratified sampling so that association strength is uniformly distributed across the full range (values above 0.7 have been pooled). 
- 
-The predictor will typically be a nonlinear function of first-order and higher-order statistical association, whose parameters can be tuned on the training set. Evaluation should report //linear correlation// (Pearson) and //rank correlation// (Kendall) between predictions and the gold standard. Participants are encouraged to produce scatterplots and explore nonlinear correlations, although the predictor function should ideally remove such nonlinearities. 
- 
-==== 3. Response prediction ==== 
- 
-In this subtask, models have to predict the most frequent free associations of native speakers for a given list of stimulus words.  This task is presumably much harder than the correlation task, since the model has to choose from a very large set of possible response words (which are not narrowed down to the set of responses observed in psychological experiments).  For this reason, evaluation will be relatively lenient: 
- 
-  - Participants suggest approx. 5 response candidates for each stimulus word. 
-  - The model predictions are accepted as correct if at least one of the candidates belongs to the most frequent responses in the gold standard (these will comprise 1 to 3 dominant response words). 
- 
-If a model achieves high precision at this level, then further analysis e.g. by taking the rank of the "correct" candidate into account should be performed. 
- 
- 
-===== Evaluation ===== 
- 
-Evaluation will be carried out by comparison of model predictions with our gold standard on the test sets.  Since our focus is not on competition, each team will be responsible for evaluating their own model and reporting the results in their paper submission.  Participants are strongly encouraged to make model predictions available for downloads to allow further analysis and discussion by other researchers. 
- 
-In order to ensure comparability of the results, we will provide [[http://www.r-project.org/|R]] and [[http://www.perl.org/|Perl]] scripts for a basic evaluation of each subtask, together with detailed instructions and examples.