# Differences

This shows you the differences between two versions of the page.

 software:rewinfomap [2010/11/07 14:17]eapontep software:rewinfomap [2010/12/07 11:58] (current)eapontep Both sides previous revision Previous revision 2010/12/07 11:58 eapontep 2010/12/07 11:52 eapontep 2010/12/05 21:56 eapontep [Testing] 2010/12/05 21:14 eapontep [Testing] 2010/12/05 21:03 eapontep [Testing] 2010/12/05 19:23 eapontep [Testing] 2010/12/05 18:02 eapontep 2010/11/07 15:18 eapontep 2010/11/07 14:17 eapontep 2010/11/07 13:32 eapontep 2010/11/01 14:07 external edit2010/10/31 12:28 eapontep created Next revision Previous revision 2010/12/07 11:58 eapontep 2010/12/07 11:52 eapontep 2010/12/05 21:56 eapontep [Testing] 2010/12/05 21:14 eapontep [Testing] 2010/12/05 21:03 eapontep [Testing] 2010/12/05 19:23 eapontep [Testing] 2010/12/05 18:02 eapontep 2010/11/07 15:18 eapontep 2010/11/07 14:17 eapontep 2010/11/07 13:32 eapontep 2010/11/01 14:07 external edit2010/10/31 12:28 eapontep created Line 21: Line 21: - Untar the .gz file and go into the created directory. - Untar the .gz file and go into the created directory. - Try: ​./​configure​This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):<​file bash>​apt-cache policy libtool​. I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:<​file bash>​autoconf -f -oconfigure​ - Try: ​./​configure​This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):<​file bash>​apt-cache policy libtool​. I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:<​file bash>​autoconf -f -oconfigure​ - - The last overwrote all the libtool-related files in the directory. Now you can run ​make​ safely. If you obtain the following error -which actually is highly unlikely<​file bash>​checking build system type... Invalid configuration x86_64-unknown-linux-gnu':​ machine x86_64-unknown'​ not recognized​you will need to deceive the program. Add before any command:<​file bash>linux 32​ + - The last overwrote all the libtool-related files in the directory. Now you can run ​make​ safely. If you obtain the following error -which actually is highly unlikely<​file bash>​checking build system type... Invalid configuration x86_64-unknown-linux-gnu':​ machine x86_64-unknown'​ not recognized​you will need to deceive the program. Add before any command:<​file bash>linux32​ + - You might also have problems with the ANSI c headers. To solve this problem<​file bash>​sudo apt-get install libc6-dev​ + + + ==== Testing ===== + + The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable ​INFOMAP_WORKING_DIR=/​home/​jrandom/​infomap_models + export INFOMAP_WORKING_DIR​ + Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.<​file bash>​infomap-build -m /​usr/​local/​share/​corpora/​manyNames.txt many_01​ + Remember to add Infomap to your PATH variable. The installation includes a manual of all the applications available. + In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:<​file bash>​infomap-build -m /​net/​data/​CL/​projects/​wordspace/​software_tests/​corpora/​infoCorpus/​directory.txt firstModel​ + In order to change the default configuration of the model, you would need to change the file: ??. I ran tests only with the default configuration (including reduction to 100 dimension). '​directory.txt'​ is a file containing the name of every file-document in the directory where the corpus is saved. Although the manual doesn'​t specify what markers should be used, including every file-name in a new line works out. The option '​-m'​ (or '​-sf'​ for single file) specifies the type of corpus. Finally, '​fistModel'​ is the name of the model created in '​INFOMAP_WORKING_DIR'​. + Two tests were run and the resulting models are available in the server: firstModel (using approximately 30000 documents -minus corrupted documents- in the Wiki Corpus. Constructing the model took me less than five minutes and the resulting directory occupies 65Mb. + + {{:​software:​vizinfo1.png|}} + + A second test was conducted again with the Wiki-Corpus,​ this time with 200000 documents. Constructing the model take less than 10 minutes. The resulting directory occupies 312Mb + + {{:​software:​vizinfo2.png|}} + + In order to access the models, the standard command is​associate [<​options>​] <​model>​ <​word>​ + Among the option, it is possible to obtain a word vector, the nearest neighbors of a word, or the word-document vector. Consider:<​file bash>​associate -m <​pathToTargetModel>​ -d -i d -n 10 document_100.txt + document_100.txt:​1.000000 + document_80694.txt:​0.925041 + document_162763.txt:​0.919077 + document_95383.txt:​0.917450 + document_176694.txt:​0.915522 + document_155572.txt:​0.914388 + document_197410.txt:​0.912332 + document_101202.txt:​0.909776 + document_144550.txt:​0.909703 + document_164895.txt:​0.908825 + ​ + This command retrieves the information from the model in <​pathToTargetModel>,​ in particular, the output should be again 10 (-n 10 ) documents (-d ), the input should be a document (- i d ). The input is the document '​document_100.txt'​. (In the server you would find the document in '​./​infoCorpus'​). After performing ​associate -m <​pathToTargetModel>​ -w -i d -n 10 document_100.txt​ + i.e., looking for words instead of document close to '​document_100.txt'​ the result was:<​file bash>​seemingly:​0.731527 + angry:​0.699753 + kid:​0.693340 + girlfriend:​0.676348 + jake:​0.658571 + boyfriend:​0.656249 + scare:​0.652340 + vicious:​0.651290 + feel:​0.649538 + bizarre:​0.643888​ + This turned out to be the entry of Kubricks film "The Clock Work Orange"​ :-). The most related document corresponds to the film  [[http://​www.youtube.com/​watch?​v=tcSMDqXT52s|"​Pretty in Pink"​]] + An interesting option provided by Infomap is to install a model. This option is preferred for fina results, which should be available to several users. Following the manual, installing a model is not much more than moving a selected number of files from a non-installed model directory to a directory available system-wide. This option is intended to keep intermediate and final results apart.