This is an old revision of the document!


This page is under construction!

General

  • Infomap NLP Software: Not in development any more. The authors recommend to use SemanticVectors instead!!!
    • Uses Latent Semantic Analysis
    • The implementation is in C.
    • Infomap is intended to build `language models' and to perform information retrieval tasks on the such models * Simple input format * You might need gdbm libraries. I had troubles installing this libraries in my laptop. In the present moment it is not working. * The documentation includes installation instructions, algorithm description and implementation guide. --- //[[eapontep@uos.de|Eduardo Aponte]] 2010/10/31 12:28// ==== Installation ==== * Before installing Infomap you would have to install gdbm libraries in your computer. This could be quite challenging. In the following I document the installation process I followed. - As a first step, you should download the last version of gdbm. - Untar the .gz file and go into the created directory. - Try: <file bash>./configure</file>This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):<file bash>apt-cache policy libtool</file>. I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:<file bash>autoconf -f -oconfigure</file> - The last overwrote all the libtool-related files in the directory. Now you can run <file bash>make</file> safely. If you obtain the following error -which actually is highly unlikely<file bash>checking build system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized</file>you will need to deceive the program. Add before any command:
      linux32
  1. You might also have problems with the ANSI c headers. To solve this problem
    sudo apt-get install libc6-dev

Testing

The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable

INFOMAP_WORKING_DIR=/home/jrandom/infomap_models
export INFOMAP_WORKING_DIR

Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.

infomap-build -m /usr/local/share/corpora/manyNames.txt many_01

Remember to add infomap to your PATH variable.

In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:

infomap-build -m /net/data/CL/projects/wordspace/software_tests/corpora/infoCorpus/directory.txt firstModel

directory.txt is a file contaning the name of every file contaning a document.