Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
software:rewinfomap [2010/12/05 21:03] eapontep [Testing] |
software:rewinfomap [2010/12/07 11:52] eapontep |
||
---|---|---|---|
Line 29: | Line 29: | ||
The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash> | The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash> | ||
export INFOMAP_WORKING_DIR</ | export INFOMAP_WORKING_DIR</ | ||
- | |||
Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.< | Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.< | ||
+ | Remember to add Infomap to your PATH variable. The installation includes a manual of all the applications available. | ||
+ | In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:< | ||
+ | In order to change the default configuration of the model, you would need to change the file: ??. I ran tests only with the default configuration (including reduction to 100 dimension). ' | ||
+ | Two tests were run and the resulting models are available in the server: firstModel (using approximately 30000 documents -minus corrupted documents- in the Wiki Corpus. Constructing the model took me less than five minutes and the resulting directory occupies 65Mb. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | A second test was conducted again with the Wiki-Corpus, | ||
+ | |||
+ | {{: | ||
- | Remember | + | In order to access the models, the standard command is<file bash> |
+ | Among the option, it is possible | ||
+ | document_100.txt: | ||
+ | document_80694.txt: | ||
+ | document_162763.txt: | ||
+ | document_95383.txt: | ||
+ | document_176694.txt: | ||
+ | document_155572.txt: | ||
+ | document_197410.txt: | ||
+ | document_101202.txt: | ||
+ | document_144550.txt: | ||
+ | document_164895.txt: | ||
+ | </ | ||
+ | This command retrieves the information from the model in < | ||
+ | i.e., looking for words instead of document close to ' | ||
+ | angry: | ||
+ | kid: | ||
+ | girlfriend: | ||
+ | jake: | ||
+ | boyfriend: | ||
+ | scare: | ||
+ | vicious: | ||
+ | feel: | ||
+ | bizarre: | ||
+ | This turned out to be the entry of Kubricks film "The Clock Work Orange" | ||
- | In corpora directory, you will find a simple py script | + | An interesting option provided by Infomap is to install |
- | directory.txt is a file contaning the name of every file contaning a document. | + | |
- | {{: | + |