Emergent Mind

Query Expansion Using Term Distribution and Term Association

(1303.0667)
Published Mar 4, 2013 in cs.IR

Abstract

Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are based on Kullback-Leibler Divergence (KLD) and Bose-Einstein statistics (Bo1). Association based term selection, on the other hand, uses information about how a candidate term co-occurs with the original query terms. Local Context Analysis (LCA) and Relevance-based Language Model (RM3) are examples of association-based methods. Our goal in this study is to investigate how these two classes of methods may be combined to improve retrieval effectiveness. We propose the following combination-based approach. Candidate expansion terms are first obtained using a distribution based method. This set is then refined based on the strength of the association of terms with the original query terms. We test our methods on 11 TREC collections. The proposed combinations generally yield better results than each individual method, as well as other state-of-the-art AQE approaches. En route to our primary goal, we also propose some modifications to LCA and Bo1 which lead to improved performance.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.