This is an old revision of the document!

Table of Contents

This page is under construction!


  • Infomap NLP Software: Not in development any more. The authors recommend to use SemanticVectors instead!!!
    • Uses Latent Semantic Analysis
    • The implementation is in C.
    • Infomap is intended to build `language models' and to perform information retrieval tasks on the such models
    • Simple input format
    • You might need gdbm libraries. I had troubles installing this libraries in my laptop. In the present moment it is not working.
    • The documentation includes installation instructions, algorithm description and implementation guide.

Eduardo Aponte 2010/10/31 12:28


  • Before installing Infomap you would have to install gdbm libraries in your computer. This could be quite challenging. In the following I document the installation process I followed.
  1. As a first step, you should download the last version of gdbm.
  2. Untar the .gz file and go into the created directory.
  3. Try:

    This command should try to configure the program to your system specifications. It is highly likely that this process fails. The most likely reason is that a system library called libtool is not version compatible. To check your version of this program (in ubuntu):

    apt-cache policy libtool

    . I presuppose you have libtool installed in your computer. You probably have a newer version of libtool as the one presuppose by the gdbm package. The solution I found was to run:

    autoconf -f -oconfigure
  4. The last overwrote all the libtool-related files in the directory. Now you can run

    safely. If you obtain the following error -which actually is highly unlikely

    checking build system type... Invalid configuration `x86_64-unknown-linux-gnu': machine `x86_64-unknown' not recognized

    you will need to deceive the program. Add before any command:

  5. You might also have problems with the ANSI c headers. To solve this problem
    sudo apt-get install libc6-dev


The first step in order to build a model is to choose a directory where the models will be created. This is done by setting an environment variable


Afterwards run build the model. Informap accepts two formats: a single file where documents are divided by xml markers or as set of files, where every file contains exactly one document. I decided to use this second option. As input, there should be a file specifying the name of file containing a document.

infomap-build -m /usr/local/share/corpora/manyNames.txt many_01

Remember to add infomap to your PATH variable.

In corpora directory, you will find a simple py script for building a corpora from a file where every line is a document. Afterwards I used the following command:

infomap-build -m /net/data/CL/projects/wordspace/software_tests/corpora/infoCorpus/directory.txt firstModel

directory.txt is a file contaning the name of every file contaning a document. Using 30000 documents