Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
software:rewinfomap [2010/11/07 14:17]
eapontep
software:rewinfomap [2010/12/07 11:52]
eapontep
Line 21: Line 21:
   - Untar the .gz file and go into the created directory.   - Untar the .gz file and go into the created directory.
   - Try: <file bash>./configure</file>This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):<file bash>apt-cache policy libtool</file>. I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:<file bash>autoconf -f -oconfigure</file>   - Try: <file bash>./configure</file>This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):<file bash>apt-cache policy libtool</file>. I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:<file bash>autoconf -f -oconfigure</file>
-  - The last overwrote all the libtool-related files in the directory. Now you can run <file bash>make</file> safely. If you obtain the following error -which actually is highly unlikely<file bash>checking build system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized</file>you will need to deceive the program. Add before any command:<file bash>linux 32</file>+  - The last overwrote all the libtool-related files in the directory. Now you can run <file bash>make</file> safely. If you obtain the following error -which actually is highly unlikely<file bash>checking build system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized</file>you will need to deceive the program. Add before any command:<file bash>linux32</file> 
 +  - You might also have problems with the ANSI c headers. To solve this problem<file bash>sudo apt-get install libc6-dev</file> 
 + 
 + 
 +==== Testing ===== 
 + 
 +The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable <file bash>INFOMAP_WORKING_DIR=/home/jrandom/infomap_models 
 +export INFOMAP_WORKING_DIR</file> 
 +Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.<file bash>infomap-build -m /usr/local/share/corpora/manyNames.txt many_01</file> 
 +Remember to add Infomap to your PATH variable. The installation includes a manual of all the applications available.  
 +In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:<file bash>infomap-build -m /net/data/CL/projects/wordspace/software_tests/corpora/infoCorpus/directory.txt firstModel</file> 
 +In order to change the default configuration of the model, you would need to change the file: ??. I ran tests only with the default configuration (including reduction to 100 dimension). 'directory.txt' is a file containing the name of every file-document in the directory where the corpus is saved. Although the manual doesn't specify what markers should be used, including every file-name in a new line works out. The option '-m' (or '-sf' for single file) specifies the type of corpus. Finally, 'fistModel' is the name of the model created in 'INFOMAP_WORKING_DIR'
 +Two tests were run and the resulting models are available in the server: firstModel (using approximately 30000 documents -minus corrupted documents- in the Wiki Corpus. Constructing the model took me less than five minutes and the resulting directory occupies 65Mb.  
 + 
 +{{:software:vizinfo1.png|}} 
 + 
 +A second test was conducted again with the Wiki-Corpus, this time with 200000 documents. Constructing the model take less than 10 minutes. The resulting directory occupies 312Mb 
 + 
 +{{:software:vizinfo2.png|}} 
 + 
 +In order to access the models, the standard command is<file bash>associate [<options>] <model> <word></file> 
 +Among the option, it is possible to obtain a word vector, the nearest neighbors of a word, or the word-document vector. Consider:<file bash>associate -m <pathToTargetModel> -d -i d -n 10 document_100.txt 
 +document_100.txt:1.000000 
 +document_80694.txt:0.925041 
 +document_162763.txt:0.919077 
 +document_95383.txt:0.917450 
 +document_176694.txt:0.915522 
 +document_155572.txt:0.914388 
 +document_197410.txt:0.912332 
 +document_101202.txt:0.909776 
 +document_144550.txt:0.909703 
 +document_164895.txt:0.908825 
 +</file> 
 +This command retrieves the information from the model in <pathToTargetModel>, in particular, the output should be again 10 (-n 10 ) documents (-d ), the input should be a document (- i d ). The input is the document 'document_100.txt'. (In the server you would find the document in './infoCorpus'). After performing <file bash>associate -m <pathToTargetModel> -w -i d -n 10 document_100.txt</file> 
 +i.e., looking for words instead of document close to 'document_100.txt' the result was:<file bash>seemingly:0.731527 
 +angry:0.699753 
 +kid:0.693340 
 +girlfriend:0.676348 
 +jake:0.658571 
 +boyfriend:0.656249 
 +scare:0.652340 
 +vicious:0.651290 
 +feel:0.649538 
 +bizarre:0.643888</file> 
 +This turned out to be the entry of Kubricks film "The Clock Work Orange" :-). 
 + 
 +An interesting option provided by Infomap is to install a model. This option is preferred for fina results, which should be available to several users. Following the manual, installing a model is not much more than moving a selected number of files from a non-installed model directory to a directory available system-wide. This option is intended to keep intermediate and final results apart.