purl.org/peter.turney
Keyphrase Extraction - Applications
- Definition of Keyphrase Extraction
- Many journals ask their authors to provide a list of key words for
their articles. We call these keyphrases, rather than key words, because they
are often phrases of two or more words, rather than single words. We define a
keyphrase list as a short list of phrases (typically five to fifteen
phrases) that capture the main topics discussed in a given document.
We define automatic keyphrase extraction as the automatic selection of important, topical
phrases from within the body of a document. Automatic keyphrase extraction is a special case of
the more general task of automatic keyphrase generation, in which the generated phrases do not
necessarily appear in the body of the given document.
- Keyphrases for Metadata
- Many researchers believe that metadata is essential to address the problems of
document management. Metadata is meta-information about a document or set of documents.
There are several standards for document metadata, including the Dublin Core Metadata Element
Set (championed by the US Online Computer Library Center), the MARC (Machine-Readable
Cataloging) format (maintained by the US Library of Congress), the GILS (Government Information
Locator Service) standard (from the US Office of Social and Economic Data Analysis),
and the CSDGM (Content Standards for Digital Geospatial Metadata) standard (from the US
Federal Geographic Data Committee). All of these standards include a field for keyphrases
(although they have different names for this field).
- Keyphrases for Highlighting
- When we skim a document, we scan for keyphrases, to quickly determine the topic of the document.
Highlighting is the practice of emphasizing keyphrases and key passages (e.g., sentences
or paragraphs) by underlining the key text, using a special font, or marking the key text with a
special colour. The purpose of highlighting is to facilitate skimming. Automatic keyphrase
extraction can be used for highlighting and also to enable text-to-speech software to
provide audio skimming capability.
- Keyphrases for Indexing
- An alphabetical list of keyphrases, taken from a collection of documents or from parts of a single
long document (chapters in a book), can serve as an index.
- Keyphrases for Interactive Query Refinement
- Using a search engine is often an iterative process. The user enters a query, examines the resulting
hit list, modifies the query, then tries again. Most search engines do not have any special features
that support the iterative aspect of searching. One approach to interactive query refinement
is to take the user's query, fetch the first round of documents, extract keyphrases from
them, and then display the first round of documents to the user, along with suggested
refinements to the first query, based on combinations of the first query with the
extracted keyphrases.
- Keyphrases for Web Log Analysis
- Web site managers often want to know what visitors to their site are seeking. Most web servers
have log files that record information about visitors, including the Internet address of the client
machine, the file that was requested by the client, and the date and time of the request. There are
several commercial products that analyze these logs for web site managers. Typically these tools
will give a summary of general traffic patterns and produce an ordered list of the most popular
files on the web site. A web log analysis program can use keyphrases to provide a deeper view
of traffic. Instead of producing an ordered list of the most popular files on the web site, a
log analysis tool can produce a list of the most popular keyphrases on the site. This
can give web site managers insight into which topics on their web site are most popular.
Updated: February 3, 2007.