purl.org/peter.turney

Analogies and Relational Similarity - Applications

Definition of Relational Similarity
When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood.
Recognizing Word Analogies
Given a measure of relational similarity, it is possible to automatically recognize word analogies, and thus solve multiple-choice word analogy problems.
Classifying Semantic Relations
A measure of relational similarity can be used to classify noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the head noun (printer) and the modifier (laser). Classifying semantic relations in noun-modifier pairs can be viewed as a supervised learning problem. If labeled training data is available (noun-modifier pairs that have been manually assigned various classes, such as "instrument" and "cause"), then an unknown pair can be classified according to the label of its nearest neighbours in the training set. A measure of relational similarity can be used to identify the nearest neighbours. There are many interesting relations, such as antonymy, that do not occur in noun-modifier pairs, but noun-modifier pairs are an interesting application, since they are very common in English (WordNet 2.0 contains more than 26,000 noun-modifier pairs).
Machine Translation
Noun-modifier pairs are difficult to translate. Machine translation cannot rely primarily on manually constructed translation dictionaries for translating noun-modifier pairs, since such dictionaries are necessarily very incomplete. It should be easier to automatically translate noun-modifier pairs when they are first classified by their semantic relations. Consider the pair "electron microscope". Is the semantic relation purpose (a microscope for viewing electrons), instrument (a microscope that uses electrons), or material (a microscope made of electrons)? The answer to this question should facilitate translation of the individual words, "microscope" and "electron", and may also help to determine how the individual words are to be combined in the target language (what order to put them in, what suffixes to add, what prepositions to add).
Word Sense Disambiguation
Noun-modifier pairs are almost always monosemous. The implicit semantic relation between the two words in the pair narrowly constrains the possible senses of the words. The intended sense of a word is determined by its semantic relations with the other words in the surrounding text. If we can identify the semantic relations between the given word and its context, then we can disambiguate the given word. Consider the noun-modifier pair "plant food". In isolation, "plant" could refer to an industrial plant or a living organism. Once we have determined that the implicit semantic relation in "plant food" is beneficiary (the plant benefits from the food), as opposed to, say, location at (the food is located at the plant), the sense of "plant" is constrained to "living organism".
Information Extraction
The standard information extraction task is to identify a specific type of information, such as the name of a person or a company, and extract that information from a given document. With a measure of relational similarity, we can take this one step further, and identify the relations between the extracted terms. For example, we can automatically recognize that the relation between the extracted person's name and the extracted company's name is that the person is the CEO of the company.
Automatic Thesaurus Generation
A thesaurus, such as WordNet, links words by relations of synonymy ("big" and "large"), hyponymy ("oak" and "tree"), antonymy ("black" and "white"), and meronymy ("wheel" and "car"). A measure of relational similarity can be used as a component in a system for automatically generating a thesaurus. Given examples of any semantic relation, the measure of relational similarity can be used to extend those examples to new cases.
Information Retrieval
Current search engines are based on attributional similarity; the similarity of a query to a document depends on correspondence between the attributes of the query and the attributes of the documents. Typically the correspondence is exact matching of words or root words. Latent Semantic Indexing (LSI) allows more flexible matching, but it is still based on attributional similarity. If we could reliably classify semantic relations, then we could ask new kinds of search queries:
  • find all documents about things that have been enabled by the Canadian government
  • find all documents about things that have a causal relation with cancer
  • find all documents about things that have an instrument relation with printing
  • Existing search engines cannot recognize the implicit instrument relation in "laser printer", so the query "instrument and printing" will miss many relevant documents. A measure of relational similarity could be used as a component in a supervised learning system that learns to identify semantic relations between words in documents. These semantic relations could then be added to the index of a conventional (attributional) search engine. Alternatively, a search engine could compare a query to a document using a similarity measure that takes into account both relational similarity and attributional similarity. A query might be phrased as a word analogy problem:
  • find all documents about things that are to printers as lasers are to printers
  • find all documents about things that are to dogs as catnip is to cats
  • find all documents about things that are to Windows as grep is to Unix
  • Processing Metaphorical Text
    Metaphorical language is very common in our daily life; so common that we are usually unaware of it. Even technical dialogue, such as computer users asking for help, is often metaphorical:
  • How can I kill a process?
  • How can I get into the LISP interpreter?
  • Tell me how to get out of Emacs.
  • Human-computer dialogue systems are currently limited to very simple, literal language. We believe that the task of mapping metaphorical language to more literal language can be approached as a kind of word analogy problem:
  • kill is to an organism as stop is to a process
  • get into is to a container as start is to the LISP interpreter
  • get out of is to a container as stop is to the Emacs editor
  • A measure of relational similarity can be used to solve these kinds of word analogy problems, and thus facilitate computer processing of metaphorical text.
    Identifying Semantic Roles
    A semantic frame for an event such as judgement contains semantic roles such as judge, evaluee, and reason, whereas an event such as statement contains roles such as speaker, addressee, and message. The task of identifying semantic roles is to label the parts of a sentence according to their semantic roles. A measure of relational similarity can help to identify semantic roles.
    Analogy-Making
    Structure Mapping Theory (SMT), and its implementation in the Structure Mapping Engine (SME), is the most influential work on modeling of analogy-making. The goal of computational modeling of analogy-making is to understand how people form complex, structured analogies. SME takes representations of a source domain and a target domain, and produces an analogical mapping between the source and target. The domains are given structured propositional representations, using predicate logic. These descriptions include attributes, relations, and higher-order relations (expressing relations between relations). The analogical mapping connects source domain relations to target domain relations. Each individual connection in an analogical mapping implies that the connected relations are similar; thus, SMT requires a measure of relational similarity, in order to form maps. Early versions of SME only mapped identical relations, but later versions of SME allowed similar, non-identical relations to match. However, the focus of research in analogy-making has been on the mapping process as a whole, rather than measuring the similarity between any two particular relations, hence the similarity measures used in SME at the level of individual connections are somewhat rudimentary. A more sophisticated measure of relational similarity, such as Latent Relational Analysis (LRA), may enhance the performance of SME. Likewise, the focus of LRA is on the similarity between particular relations, and systematic mapping between sets of relations is ignored, so LRA may also be enhanced by integration with SME.

    Updated: February 3, 2007.