Research Scholar
Dataset:
In my research, I have used publicly available ICDAR competition dataset for layout analysis as well as for table detection. Listed below are the competition details:
And also generated my own layout analysis dataset (a total of 700 document pages) for variuos Indic Scripts (six scripts namely, Meetei Mayek, Telegu, Odia, Tamil, Malayalam and Kanada) and English language document using PRImA's ground-truth generation tool called Aletheia.
Groundtruth generation tool:
Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. And
it aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback.