Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
software:gensim [2010/12/03 17:04]
maebert
software:gensim [2010/12/03 17:26]
maebert
Line 36: Line 36:
 === Benchmark === === Benchmark ===
  
-I used the EditedCorpora File to benchmark performance. The File was edited into the List-of-Words-Format (by just inserting a line containing the number of documents at the top). Loading the corpus and transforming it into sparse vectors takes appx24 minutes on Quickie.+I used the EditedCorpora File to benchmark performance. The File was edited into the List-of-Words-Format (by just inserting a line containing the number of documents at the top).  Afterwards Latent Semantic Indexing was performed on the corpus. As the algorithm used for singular value decomposition is incremental, the memory load is constant and can be controlled by passing a **chunks** parameter to the constructor of the LSI model. This parameter controls how many documents will be loaded into RAM at once, the default is 20000Larger chunks will speed things up, but also require more RAM. In the distributed mode, this is the number of documents which will be passed to the workers over the network, hence we have to factor in the network transmission speed in choosing our chunk size. For the following experiments, a chunk size of 1000 documents was used.
  
 +== Loading the Corpus ==
 +
 +Loading the corpus and transforming it into sparse vectors takes quite exactly 23 minutes on Quickie.
 +
 +== Single Mode ==
 +
 +== Distributed Mode ==
 +
 +Please refer to [[AdvancedGensimUsage| advanced usage]] page for details on how to setup Gensim in distributed mode. For testing the distributed mode of the algorithm, twelve 2.54 GHz, 4 GB RAM dual core boxes have been used as workers, with one worker per core, totaling 24 workers. LI