Table Structure Understanding and Its Performance Evaluation

Yalin Wang, Ihsin T. Phillips, and Robert M. Haralick


Abstract

This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level.

Figures (click on each for a larger version):


Related Publications