Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
software:rewinfomap [2010/12/05 21:56] eapontep [Testing] |
software:rewinfomap [2010/12/07 11:58] eapontep |
||
---|---|---|---|
Line 29: | Line 29: | ||
The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash> | The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash> | ||
export INFOMAP_WORKING_DIR</ | export INFOMAP_WORKING_DIR</ | ||
- | |||
Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.< | Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.< | ||
+ | Remember to add Infomap to your PATH variable. The installation includes a manual of all the applications available. | ||
+ | In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:< | ||
+ | In order to change the default configuration of the model, you would need to change the file: ??. I ran tests only with the default configuration (including reduction to 100 dimension). ' | ||
+ | Two tests were run and the resulting models are available in the server: firstModel (using approximately 30000 documents -minus corrupted documents- in the Wiki Corpus. Constructing the model took me less than five minutes and the resulting directory occupies 65Mb. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | A second test was conducted again with the Wiki-Corpus, | ||
- | Remember to add infomap to your PATH variable. | + | {{: |
- | In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. | + | In order to access the models, the standard command is<file bash> |
- | directory.txt is a file contaning | + | Among the option, it is possible to obtain |
- | {{:software:vizinfo1.twopi.png|30000 Documents}} | + | document_100.txt: |
- | {{:software:vizinfo1.twopi.png|200000 Documents}} | + | document_80694.txt: |
+ | document_162763.txt: | ||
+ | document_95383.txt: | ||
+ | document_176694.txt: | ||
+ | document_155572.txt: | ||
+ | document_197410.txt: | ||
+ | document_101202.txt: | ||
+ | document_144550.txt: | ||
+ | document_164895.txt: | ||
+ | </file> | ||
+ | This command retrieves the information from the model in < | ||
+ | i.e., looking for words instead of document close to ' | ||
+ | angry:0.699753 | ||
+ | kid:0.693340 | ||
+ | girlfriend: | ||
+ | jake:0.658571 | ||
+ | boyfriend:0.656249 | ||
+ | scare:0.652340 | ||
+ | vicious: | ||
+ | feel: | ||
+ | bizarre: | ||
+ | This turned out to be the entry of Kubricks film "The Clock Work Orange" | ||
+ | An interesting option provided by Infomap is to install a model. This option is preferred for fina results, which should be available to several users. Following the manual, installing a model is not much more than moving a selected number of files from a non-installed model directory to a directory available system-wide. This option is intended to keep intermediate and final results apart. |