Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
software:rewinfomap [2010/11/01 14:07] 127.0.0.1 external edit |
software:rewinfomap [2010/12/07 11:58] eapontep |
||
---|---|---|---|
Line 1: | Line 1: | ||
This page is under construction! | This page is under construction! | ||
+ | |||
+ | |||
+ | ==== General ==== | ||
* **Infomap NLP Software: Not in development any more. The authors recommend to use SemanticVectors instead!!!** | * **Infomap NLP Software: Not in development any more. The authors recommend to use SemanticVectors instead!!!** | ||
Line 11: | Line 14: | ||
--- // | --- // | ||
+ | |||
+ | ==== Installation ==== | ||
+ | |||
+ | * Before installing Infomap you would have to install gdbm libraries in your computer. This could be quite challenging. In the following I document the installation process I followed. | ||
+ | - As a first step, you should download the last version of gdbm. | ||
+ | - Untar the .gz file and go into the created directory. | ||
+ | - Try: <file bash> | ||
+ | - The last overwrote all the libtool-related files in the directory. Now you can run <file bash> | ||
+ | - You might also have problems with the ANSI c headers. To solve this problem< | ||
+ | |||
+ | |||
+ | ==== Testing ===== | ||
+ | |||
+ | The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash> | ||
+ | export INFOMAP_WORKING_DIR</ | ||
+ | Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.< | ||
+ | Remember to add Infomap to your PATH variable. The installation includes a manual of all the applications available. | ||
+ | In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:< | ||
+ | In order to change the default configuration of the model, you would need to change the file: ??. I ran tests only with the default configuration (including reduction to 100 dimension). ' | ||
+ | Two tests were run and the resulting models are available in the server: firstModel (using approximately 30000 documents -minus corrupted documents- in the Wiki Corpus. Constructing the model took me less than five minutes and the resulting directory occupies 65Mb. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | A second test was conducted again with the Wiki-Corpus, | ||
+ | |||
+ | {{: | ||
+ | |||
+ | In order to access the models, the standard command is<file bash> | ||
+ | Among the option, it is possible to obtain a word vector, the nearest neighbors of a word, or the word-document vector. Consider:< | ||
+ | document_100.txt: | ||
+ | document_80694.txt: | ||
+ | document_162763.txt: | ||
+ | document_95383.txt: | ||
+ | document_176694.txt: | ||
+ | document_155572.txt: | ||
+ | document_197410.txt: | ||
+ | document_101202.txt: | ||
+ | document_144550.txt: | ||
+ | document_164895.txt: | ||
+ | </ | ||
+ | This command retrieves the information from the model in < | ||
+ | i.e., looking for words instead of document close to ' | ||
+ | angry: | ||
+ | kid: | ||
+ | girlfriend: | ||
+ | jake: | ||
+ | boyfriend: | ||
+ | scare: | ||
+ | vicious: | ||
+ | feel: | ||
+ | bizarre: | ||
+ | This turned out to be the entry of Kubricks film "The Clock Work Orange" | ||
+ | An interesting option provided by Infomap is to install a model. This option is preferred for fina results, which should be available to several users. Following the manual, installing a model is not much more than moving a selected number of files from a non-installed model directory to a directory available system-wide. This option is intended to keep intermediate and final results apart. |