Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
software:rewsspacepackage [2010/11/16 14:30]
eapontep [STATE]
software:rewsspacepackage [2010/11/21 13:04]
eapontep [Testing]
Line 28: Line 28:
   * To installed the package go to a target directory. The authors recommends to use the following command:<file bash>svn checkout http://airhead-research.googlecode.com/svn/trunk/sspace sspace-read-only</file>   * To installed the package go to a target directory. The authors recommends to use the following command:<file bash>svn checkout http://airhead-research.googlecode.com/svn/trunk/sspace sspace-read-only</file>
   * A new directory should have been created. Go to the directory and use the command<file bash>ant</file>. Ant is part of the Apache project and is used to build java libraries. It will automatically detect the file build.html and install from it. I explained [[rewSemVector|here]] how to install ant.   * A new directory should have been created. Go to the directory and use the command<file bash>ant</file>. Ant is part of the Apache project and is used to build java libraries. It will automatically detect the file build.html and install from it. I explained [[rewSemVector|here]] how to install ant.
 +  * If you want to make direct use of the .jar, you would also like to use the command
  
 ==== Technical Issues ==== ==== Technical Issues ====
Line 41: Line 42:
  
  --- //[[eapontep@uos.de|Eduardo Aponte]] 2010/11/16 10:38//  --- //[[eapontep@uos.de|Eduardo Aponte]] 2010/11/16 10:38//
 +
 +=== First Trial ===
 +  * The trials I am performing now are based only on using the already developed .jar files and executing the programs from the command lines, i.e., doing no hacking on any class
 +  * I began with a very simple trial on the whole corpus using LSA without threads. As expected, after 20 minutes the processes finished with a memory error.
 +
 +=== Second Trial ===
 +
 +  * I ran the following command on the corpus <file bash>java -jar /net/data/CL/projects/wordspace/software_tests/sPackage/sspace-read-only/bin/lsa.jar -dwp500_articles_hw.latin1.txt.gz  -X200 -t10 -v -n100 results/firstTry.sspace</file> This command should read 200 documents from the corpus (the first 200 lines), using 10 threads (no idea how this would take place) and use svd (default set) with 100 dimensions. I didn't check memory, although I allowed a verbose terminal output.
 +    * Nicely I got the following:<file bash>FINE: Processed all 200 documents in 0.271 total seconds</file>
 +    * However, as expected, the process got stuck during SVD:<file bash>Nov 21, 2010 12:44:10 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
 +INFO: reducing to 100 dimensions
 +Nov 21, 2010 12:44:10 PM edu.ucla.sspace.matrix.MatrixIO matlabToSvdlibcSparseBinary
 +INFO: Converting from Matlab double values to SVDLIBC float values; possible loss of precision</file>
 +    * Interestingly, this class uses as internal representation of the document vectors MATLAB matrices. This is important, because this format is used also by SVD implementation in scipy and in divisi.