Evaluation of Word and Character Recognizers



Evaluation and Sample Results

The object is to recognize words as and characters accurately as possible, at the zero reject point. A recognizer may return up to 5 candidates for each classification, but only the first ranked result will be used for ordering in the league table – though the accuracies for other ranks will also be calculated. We will also compute ROC curves which show how the error rate varies with respect to the reject rate.

Note that generally speaking, commercial applications of OCR are more concerned with the reject rate for a specified accuracy. We've adopted a simpler measure here since reading text in scenes is still an immature technology, and the zero reject accuracy is the simplest figure to quote.

View some sample results for word recognition (a human recognizer who gets bored part way through the test!)..

Uploading your Recognizer

To upload your recognizer package all the files needed to run it in a zip file.

Then register and upload the zip file here.

Note: windows executables or Java code are vastly preferred; Linux executables by special arrangement only please!

Running a Recognizer

We'll run your recognizer (called myrec) by invoking it from the command line as follows:

> myrec dictionary.xml dataset.xml output.xml



XML File Formats

dictionary.xml

This is a simple list of words at present, and using xml for this might seem like overkill.

However, we're using xml for the other file formats, so we use it for this to be consistent. The following example shows a dictionary with just two words in: hello and world. Dictionaries are case-sensitive, so Hello and hello would be two separate entries.

<?xml version="1.0" encoding="UTF-8"?>
<Dictionary>
  <Word s="Hello" />
  <Word s="World" />
</Dictionary>

The CharDic.xml and WordDic.xml provide complete coverage dictionaries for the character and word recognitoin sample sets.

Note that for the Word Recogniton problem, the dictionary for will be produced by taking the union of a word list from the WordNet corpus and the set of all words used in the robust reading trialTrain, trialTest and test sets.

dataset.xml

The dataset file has the following format.

The first file is for a dataset of words:

<?xml version="1.0" encoding="UTF-8"?>
<imagelist name="Icdar 2003 Word Sample">
  <image file="word/1/1.jpg" tag="22" />
  <image file="word/1/2.jpg" tag="WASHINGTON" />
  <image file="word/1/3.jpg" tag="postgrad" />
  <image file="word/1/4.jpg" tag="study" />
  ...
</imagelist>


Note that for the dataset of characters, the format is identical; the only difference being that the tags are limited to be strings of one character:


<?xml version="1.0" encoding="UTF-8"?>
<imagelist>
  <image file="char/1/1.jpg" tag="2" />
  <image file="char/1/2.jpg" tag="2" />
  <image file="char/1/3.jpg" tag="W" />
  <image file="char/1/4.jpg" tag="A" />
  <image file="char/1/5.jpg" tag="S" />
  ...
</imagelist>


Note that these datasets are training sets – the test sets for the competition will have the tags removed!


output.xml

The output.xml file lists the outputs of your recognizer for each input image.

Your recognizer is allowed five guesses – these must be listed in best first order, with each one given a confidence score.

If your recognizer produces no output for a particular image, still include the result element with the file, but leave it empty.

Here is part of a sample output file:

<?xml version="1.0" encoding="UTF-8"?>
<Results>
  <Result file="word/1/1.jpg">
    <Response word="hello" p="1.0" />
    <Response word="Science" p="0.8" />
    <Response word="Computer" p="0.4" />
  </Result>
  <Result file="word/1/2.jpg"/>
  ...
</Results>


Note that the recognizer failed to produce an output for “word/1/2.jpg” - so we get an element with no Responses in.


Plain Text Formats

In case anyone has trouble reading or writing XML, we offer the alternative of using the following plain-text file formats.

The filenames are the same as before, except they now have .txt instead of .xml extensions.

Your recognizer will be invoked as follows:

> myrec dictionary.txt dataset.txt output.txt


The plain text file formats are as follows:


dictionary.txt

This is simply a list of words in ASCII, with a separate word on each line (for character recognition, each word is a single character).

e.g.:

aardvark
and
zygot


A digit recognition example:


0
1
2
3
4
5
6
7
8
9


dataset.txt

The dataset is now just a list of relative paths to each image to be recognized. For example:

char/1/1.jpg
char/1/2.jpg
char/1/3.jpg
char/1/4.jpg
char/1/5.jpg


output.txt

The output file should now consist of the image file that was recognized, followed by the responses for that image.

Each response is given as a pair of values: the text, followed by the confidence. The first line shows that for image “word/1/1.jpg” the recognizer has produced three word hypotheses: hello, Science and Computer, with confidences of 1.0, 0.8 and 0.4 respectively.


word/1/1.jpg hello 1.0 Science 0.8 Computer 0.4
word/1/2.jpg
  ...


Note that the recognizer failed to produce an output for “word/1/2.jpg” - so we get the file name, but nothing else on that line.





Java Interface

This allows you to supply a recognizer that takes a set of encoded image bytes and a dictionary and returns a set of recogniton hypotheses (Responses)– thus freeing you from any I/O considerations. There is also an alternative interface that works with a decoded image, but this is restricted to grey-level images.

Here are the interface and class definitions (package declarations omitted):

public interface GreyWordRecognizer {

    /** sets the dictionary of all possbile words or chars
     * that this recognizer may return.
     */
    public void setDictionary(String[] words);

    /** recognize the grey level image - offering up to
     *  nCadidates as alternative hypotheses - each one rated
     *  by a measure of confidence
     */
    public Response[] recognize(byte[][] greyImage, int nCandidates);

}



public interface EncodedWordRecognizer {

    /** sets the dictionary of all possbile words or chars
     * that this recognizer may return.
     */
    public void setDictionary(String[] words);

    /** recognize the encodedImage - offering up to
     *  nCadidates as alternative hypotheses - each one rated
     *  by a measure of confidence
     */
    public Response[] recognize(byte[] encodedImage, int nCandidates);

}



/**
 * Use instances of this class to return word or character
 * recognition hypotheses.
 */

public class Response {
    String word;  // the word or character - string of length one for the character
    double p;     // the confidence or likelihood of this response

    public Response(String word, double p) {
        this.word = word;
        this.p = p;
    }
}



Source Code

The source code to run a sample recognizer will be provided shortly.



Sample Results

Some sample results rendered in HTML can be found here.



Ultra Simple Interface

If all the above options seem overcomplex, there is an alternative.

The simplest interface we can offer is one where your recognizer (word or character) reads an image file (JPEG format), and prints the word or character in that file to standard output.

Here we assume that the word and character recognizers are working with set alphabets / dictionaries.

For the character recognizers, we'll fix this to be the set of upper and lowercase characters plus digits i.e. Full alphanumeric (regexp: [0-9a-zA-Z])

For word recognizers we'll fix the dictionary to be all the words used in this trial, plus a large set of words from the wordnet corpus.

Character Recognition Example

Gven the following inmage in file datasets/sample/char/6/571.jpg


A successful recognizer would operate as follows:

> myrec datasets/sample/char/6/571.jpg
0



Word Recognition Example

Gven the following inmage in file datasets/sample/word/1/28.jpg


A successful recognizer would operate as follows:

> myrec datasets/sample/word/1/28.jpg
european