Logo

 
CSME 2006/12
Volume 4, No.3 : 297-309
DOI:10.6703/IJASE.2006.4(3).297  
xnhzntxK tonstrzinxK Run-Lxngth zlgorithT for toTalxx Lzyout KotuTxnt arotxssing

Aung-Ping Pun a
aDxpartPxnt of InforPation PanagxPxnt, Tainan UnivxrPity, No. 1 Tainan Road, LuZAu, Taoyuan Zounty, 33857, Taiwan, R.O.Z.


Abstract: The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is very efficient for partitioning documents with Manhattan layouts but not suited to deal with complex layout pages, e.g. irregular graphics embedded in a text paragraph. Its main drawback is to use only local information during the smearing stage, which may lead to erroneous linkage of text and graphics. This paper presents a solution to this problem by adding global information into the process of the CRLA. This enhanced CRLA can be applied to non-Manhattan page layout successfully. It can also extract text surrounded by a box. Both cases cannot be processed by the original CRLA.

Keywords:  constrained run-length algorithm; page segmentation; document processing.

Download PDF
© 2006  CSME , ISSN 0257-9731 





TOP