purl.org/peter.turney
Tools for Latent Relational Analysis
If you would like to replicate the
Latent Relational Analysis algorithm
(or improve upon it), here are some resources that you may find useful.
Note that none of these resources are my work. See the
paper for a detailed
description of the LRA algorithm.
- Corpus: LRA requires a big corpus
- GOV2
- 426 GB of text
available for purchase
- 25,205,179 documents
- GOV2 is the work of Charlie Clarke,
Ian Soboroff,
Nick Craswell, et al.
- Search Engine: LRA is designed for a passage retrieval search engine
- Wumpus
- passage retrieval and document retrieval
- uses powerful GCL (generalized concordance lists) query language
- exact frequency counts
- Wumpus is the work of Stefan Büttcher,
Charlie Clarke, et al.
- Wumpus is based on the earlier MultiText system
- MultiText is the work of Egidio Terra,
Charlie Clarke, et al.
- SVD: LRA uses Singular Value Decomposition
- SVDLIBC
- efficient implementation of SVD
- supports sparse matrix formats
- SVDLIBC is the work of Doug Rohde
- SVDLIBC is based on the earlier SVDPACKC library
- SVDPACKC is the work of Michael Berry et al.
- Thesaurus: LRA uses Dependency-based Word Similarity
- online demo
- download
- Dependency-based Word Similarity is the work of
Dekang Lin
- Perl: LRA was implemented in Perl
- Perl was used to integrate the above parts
- Berkeley DB for Perl was used to store patterns and count frequencies
- Perl Data Language was used for matrix operations
- Perl is the work of Larry Wall et al.
- Perl Data Language is the work of
Karl Glazebrook
et al.
- Noun-Modifier Data: LRA was evaluated with noun-modifier data
- the dataset
is available for downloading
- the dataset is
described here
- the noun-modifier dataset is the work of
Vivi Nastase and
Stan Szpakowicz
- more noun compound datasets
are available
- SAT Questions: LRA was evaluated with multiple-choice SAT analogy questions
- contact me for a copy
- the SAT questions were collected by
Michael Littman
Updated: February 9, 2007.