UHIndicPCwS(University of Hyderabad Indic Printed Character with Style) is a printed character dataset that contains characters from six different scripts -Tamil, Telugu, Kannada, Malayalam, Gujarati and Odia. It is available in the pickle file- UHIndicPCwS.pkl.
It is a python pickle file that contains a dictionary with 6 keys which represent the six scripts for which the data is available:{Malayalam:[X, Y], Telugu:[X, Y], Odia:[X, Y], Gujarati:[X, Y], Tamil:[X, Y], Kannada:[X, Y]}. Each dictionary value is a set of training samples and the corresponding class labels. - X is a list of numpy arrays which represent the image file. - Y is a list of class labels of the corresponding image file. Class label is denoted in the format of scriptname_classnumber(Ex: Malayalam_1).
The following python code snippet can be used to retrieve the images and class labels of a particular script.
import pickle
import numpy as np
#Replace with the key value of the script for which the data should be retrieved
script_of_interest ="Tamil"
#provide the pickle file with path
file_name='./UHIndicPCwS.pkl'
with open(file_name,"rb") as pickle_out:
lang_data=pickle.load(pickle_out)
for key,val in lang_data.items():
if(key == script_of_interest):
image_data=val[0] #images as a list of numpy array
classlabel_data=val[1] #class labeles of the images
print("No of images:",len(image_data))
break