Emergent Mind

Abstract

This paper addresses the problem of classifying web documents using domain ontology. Our goal is to provide a method for improving the classification of medical documents by exploiting the MeSH thesaurus (Medical Subject Headings) which will allow us to generate a new representation based on concepts. This approach was tested with two well-known data mining algorithms C4.5 and KNN, and a comparison was made with the usual representation using stems. The enrichment of vectors using the concepts and the hyperonyms drawn from the domain ontology has significantly boosted their representation, something essential for good classification. The results of our experiments on the benchmark biomedical collection Ohsumed confirm the importance of the approach by a very significant improvement in the performance of the ontology-based classification compared to the classical representation (Stems) by 30%.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.