A Glossary of Information Retrieval Terminology

February 14, 2005

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

By: Rand Fishkin

February 14, 2005

A Glossary of Information Retrieval Terminology

Search Engines

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

A Glossary of Information Retrieval Terminology

Many times when reading through complex threads, research papers or even blogs by some of the more advanced SEOs in the industry, I get lost in the meaning of terms and an entire paragraph or document can be lost to my ignorance. Luckily, great resources like the Modern Information Retrieval Glossary from Berkeley University.

I've picked out some of the more important terms to know:

Clustering - the grouping of documents which satisfy a set of common properties. The aim is to assemble together documents which are related among themselves. Clustering can be used, for instance, to expand a user query with new and related index terms.
E measure - an information retrieval performance measure, distinct from the harmonic mean, which combines recall and precision.
Generalized vector space model - a generalization of the classic vector model based on a less restrictive interpretation of term-to-term independence.
Information retrieval - (IR) part of computer science which studies the retrieval of information (not data) from a collection of written documents. The retrieved documents aim at satisfying a user information need usually expressed in natural language.
Latent semantic indexing - an algebraic model of document retrieval based on a singular value decomposition of the vectorial space of index terms.
Probabilistic model - a classic model of document retrieval based on a probabilistic interpretation of document relevance (to a given user query).
Stemming - a technique for reducing words to their grammatical roots.
TREC collection - a reference collection which contains over a million documents and which has been used extensively in the TREC conferences. The TREC collection has been organized by NIST and is becoming a standard for comparing IR models and algorithms.
Zipf's Law - an empirical rule that describes the frequency of the text words. It states that the i-th most frequent word appears as many times as the most frequent one divided by i^Ã¸, for some Ã¸ <= 1.

A Glossary of Information Retrieval Terminology

Table of Contents

A Glossary of Information Retrieval Terminology

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

How to Future-Proof Your SEO Strategy with Relevance Engineering

Moz’s Brand Authority: Multi-Market, More Features, More Data!

Optimizing for AI Overviews — Whiteboard Friday

Comments

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved