Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
software:rewsspacepackage [2010/11/21 13:04]
eapontep [Testing]
software:rewsspacepackage [2010/12/07 12:33]
eapontep [Testing]
Line 55: Line 55:
 Nov 21, 2010 12:44:10 PM edu.ucla.sspace.matrix.MatrixIO matlabToSvdlibcSparseBinary Nov 21, 2010 12:44:10 PM edu.ucla.sspace.matrix.MatrixIO matlabToSvdlibcSparseBinary
 INFO: Converting from Matlab double values to SVDLIBC float values; possible loss of precision</file> INFO: Converting from Matlab double values to SVDLIBC float values; possible loss of precision</file>
-    Interestingly, this class uses as internal representation of the document vectors MATLAB matricesThis is importantbecause this format is used also by SVD implementation in scipy and in divisi.+ 
 + 
 +=== Trials with LSA === 
 + 
 +I performed a number of trials with LSA. This trials where intended to prove the time and memory comsumed by different algorithms compatible with the LSA implementation. Available from the command lines are: 
 + 
 +  SVDLIBC 
 +  * Matlab 
 +  * GNU Octave 
 +  * JAMA 
 +  * COLT  
 + 
 +I didn't make a large review of the implementations and rather start proving every algorithm. As previous results showed, using the default algorithm (SVDLIBC) generated strange results (the angular distance between vectors) was extremely low among close neighborsTwo reason were identified as possible: eitherthe number of dimensions was to low in relation to the numbers of documents; this would cause that performing SVD would collapse the distance, creating a extremely dense vector-space. A second possibility was a bug in the implementation. My supposition is that, since the implementation requires a pipeline between the internal format of the argument and SVDLIBC, a loss in precision caused the problem. If that were the problem, selecting Matlab for SVD should solve the problem (because the matlab format are the internal format of lsa are identical). 
 + 
 +I performed a test with 30000 document and 200 dimensions with SVDLIBC, MATLAB, OCTAVE and COLT. The results were in part disappointing because, with the exception of MATLAB, all other algorithms ran out of memory (in particular, the pipeline between LSA and the SVD algorithm ran out of heap memory). 
 + 
 +  * SVDLIBC: Ran out of memory at 6459 seconds. 
 +  * Matlab: After 5083 seconds returned a 450Mb .sspace file 
 +  * GNU Octave: Ran out of memory at 7624 seconds 
 + 
 + 
 +{{:software:stats.png|Statistics 1.}} 
 +