Document is anything that conveys information. Traditionally, documents meant paper-based written media or historical palmleaf and papyrus inscriptions. Today, documents are more diverse and are increasingly electronic and entirely in digital form. Document contents are also no longer only text, but comprise photographs, drawings, tables and plots. Even video is sometimes considered a document.

Document Analysis and Recognition (DAR) is a specialised field concerned with designing and developing algorithms and techniques that process and extract information from documents using computers. Documents are always input as scanned images to computers. DAR, for a long time, had been associated with Optical Character Recognition (ODR), a task that extracted textual data in digital form from scanned images of documents. Today, DAR has moved beyond OCR and researchers are exploring higher levels of abstraction as well as the inter-relationships between textual and non-textual components.

In this lab, we investigate the different aspects of documents but also specially focus on Indian documents, which are rich with multilingual and culturally strong content. Indian documents provide a number of challenges due to the complexity of the scripts, quality of the underlying media and the printing processes, etc. In particular, analysis and recognition of handwritten documents is in its infancy.

OCR Lab@SCIS has been one of the pioneers in DAR research in India and is recognised throughout the Indian research community for its work.

Research


  • Deep Image Priors for Binarisation
  • Table Question Answering Systems
  • Visual Question Answering Systems
  • Zero-shot Learning for Handwritten Character Recognition
  • Telugu OCR System
  • Saara: A Machine Translation System for Kannada
  • Software Tools for Forensic Analysis of Documents

Funded Projects:


  • Resource Centre for Indian Language Technology Solutions (Telugu) (2001) - ₹98 L
  • Development of Software Tools for Analysing Additions, Deletions and Alterations in Documents (2002) - ₹20 L
  • Development of Robust Document Analysis and Recognition System for Printed Indian Scripts - Phase I (2007 - 2010) - ₹36 L
  • Development of Robust Document Analysis and Recognition System for Printed Indian Scripts - Phase II (2011 - 2015) - ₹78 L
The list is incomplete pending updates to this webpage.
Did you know that the OCR Lab@SCIS developed the
first complete OCR System for Telugu?
more >>>

 

RESEARCH NEWS
  • Salman, K. H. won the best poster award at the India-AI Impact Summit 2026 Pre-Summit Events at SCIS.
  • Padma V. presented a paper titled, "Zero-Shot approach to Tamil OCR" at SPELLL 2025 ...

People


Faculty:
  • Prof. Atul Negi
  • Prof. Chakravarthy Bhagvati
Earlier Faculty:
  • Prof. K. Narayana Murthy
Research Scholars:
  • K. Junaciya
  • M. Madhuri Lata
  • V. Padma
  • K. H. Salman
  • K. Rakesh
  • K. Deepthi (2024)
  • Melinda Laiphangbam (2023)
  • Patrick Niyishaka (2022)