Emergent Mind

Abstract

Text detection and segmentation is an important prerequisite for many content based image analysis tasks. The paper proposes a novel text extraction and character segmentation algorithm using Maximally Stable Extremal Regions as basic letter candidates. These regions are then subjected to thresholding and thereafter various connected components are determined to identify separate characters. The algorithm is tested along a set of various JPEG, PNG and BMP images over four different character sets; English, Russian, Hindi and Urdu. The algorithm gives good results for English and Russian character set; however character segmentation in Urdu and Hindi language is not much accurate. The algorithm is simple, efficient, involves no overhead as required in training and gives good results for even low quality images. The paper also proposes various challenges in text extraction and segmentation for multilingual inputs.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.