purl.org/peter.turney

Tools for Latent Relational Analysis

If you would like to replicate the Latent Relational Analysis algorithm (or improve upon it), here are some resources that you may find useful. Note that none of these resources are my work. See the paper for a detailed description of the LRA algorithm.

Corpus: LRA requires a big corpus
  • GOV2
  • 426 GB of text available for purchase
  • 25,205,179 documents
  • GOV2 is the work of Charlie Clarke, Ian Soboroff, Nick Craswell, et al.
    Search Engine: LRA is designed for a passage retrieval search engine
  • Wumpus
  • passage retrieval and document retrieval
  • uses powerful GCL (generalized concordance lists) query language
  • exact frequency counts
  • Wumpus is the work of Stefan Büttcher, Charlie Clarke, et al.
  • Wumpus is based on the earlier MultiText system
  • MultiText is the work of Egidio Terra, Charlie Clarke, et al.
    SVD: LRA uses Singular Value Decomposition
  • SVDLIBC
  • efficient implementation of SVD
  • supports sparse matrix formats
  • SVDLIBC is the work of Doug Rohde
  • SVDLIBC is based on the earlier SVDPACKC library
  • SVDPACKC is the work of Michael Berry et al.
    Thesaurus: LRA uses Dependency-based Word Similarity
  • online demo
  • download
  • Dependency-based Word Similarity is the work of Dekang Lin
    Perl: LRA was implemented in Perl
  • Perl was used to integrate the above parts
  • Berkeley DB for Perl was used to store patterns and count frequencies
  • Perl Data Language was used for matrix operations
  • Perl is the work of Larry Wall et al.
  • Perl Data Language is the work of Karl Glazebrook et al.
    Noun-Modifier Data: LRA was evaluated with noun-modifier data
  • the dataset is available for downloading
  • the dataset is described here
  • the noun-modifier dataset is the work of Vivi Nastase and Stan Szpakowicz
  • more noun compound datasets are available
    SAT Questions: LRA was evaluated with multiple-choice SAT analogy questions
  • contact me for a copy
  • the SAT questions were collected by Michael Littman
  • Updated: February 9, 2007.