purl.org/peter.turney

Synonyms and Attributional Similarity - Applications

Definition of Attributional Similarity
The attributional similarity of two words is the degree of similarity in their meanings. When two words have a very high degree of semantic similarity, we call them synonyms.
Recognizing Synonyms
A measure of attributional similarity can be used to recognize syonyms, although antonyms also have a high degree of attributional similarity. (Antonyms can be distinguished by their semantic orientation.)
Generating Synonyms
The words in a large corpus can be ranked in order of their attributional similarity to a given target word. The top ranked words are likely to be synonyms of the target word. This can be used for automatic thesaurus generation.
Determining Semantic Orientation
Some words have negative associations ("immature") and others have positive associations ("wise"). Suppose we wish to automatically determine whether a given word or phrase is positive or negative. One approach is to measure the similarity between the given word and a word that is known to be positive ("excellent") and a word that is known to be negative ("poor"). We can assign a numerical rating to the given word, based on whether it is more similar to the positive word or the negative word. This can be extended from words and phrases to whole documents. For example, we can determine whether a document, such as a movie review, is positive or negative by calculating the average semantic orientation of the adjectives and adverbs in the document.
Lexical Cohesion
In a cohesive document, we expect many of the words to be semantically similar to each other. Automatically generated summaries of a document often lack cohesiveness. A measure of semantic similarity can be used to identify outliers in an automatically generated summary. We can improve the quality of the summary by removing these outliers, to make a more cohesive summary.
Word Sense Disambiguation
Consider the ambiguous word "bank". Suppose we encode in a computer the knowledge that "bank" could be interpreted as "river bank" or "financial bank". Given the phrase "bank account", the computer can measure the semantic similarity between "account" and "financial" and compare it to the semantic similarity between "account" and "river". Since "account" is more similar to "financial", the computer can infer that "bank" in the phrase "bank account" probably refers to "financial bank". Thus a good measure of attribuational similarity can facilitate word sense disambiguation.
Information Retrieval
Given a query, a search engine produces a ranked list of matching documents. The list is sorted by the degree of attributional similarity between the words in the query and the words in the documents. Typically the similarity is simply exact matching, but more sophisticated approaches use stemming (so that "ski" will match with "skiing") or a thesaurus (so that "car" will match with "automobile"). Latent Semantic Indexing (LSI) uses a statistical measure of attributional similarity to improve search engine performance.
Grading Student Essays
Automatic grading of student essays typically involves the use of a statistical measure of attributional similarity, such as Latent Semantic Analysis (LSA).

Updated: February 3, 2007.