Contents
- We accept no responsibility for any damages or losses incurred by
participants.
- Participation in this competition is entirely at the risk of the
participant.
- We reserve the right to update the rules and problem specification in
order to clarify any ambiguities, close any loopholes or correct any errors that come to
light.
- We reserve the right to update the evaluation software and re-run any
test results in order to fix any bugs or loopholes in the system.
The spirit of the competition is to further OCR research by doing
blind testing of OCR algorithms on some reasonably challenging data. In addition to
measuring the accuracy, we also measure many other features of an algorithms performance.
The competition is particularly timely given the increasing use of
mobile computing platforms which require high-performance pattern recognition algorithms
in order to provide better user interfaces. When choosing algorithms for such
platforms it is vital to consider the computing resources required by an algorithm, and
not just its accuracy (see below).
- The submission process to be followed is described on the Trainable OCR page.
- The prize for the winner is 100 UK Pounds. The recipient
is responsible for paying any taxes required by the law of the land in which they reside.
- Members (staff/students) of the University of Essex and
employees of the Post Office (UK) are welcome to enter, but are not eligible for the cash
prize.
- The competition is open from Nov 8th 00:00 GMT.
- The competition closes Dec 6th 00:00 GMT. In the event
that there is no winner by this date, we may extend the competition period at our
discretion.
- The winner of the competition will be the owner of the e-mail address
who's entry in the views of the competition organisers best met
the winning criteria.
- The decision of the competition organisers is final.
- Competitors may make multiple submissions, but in the event that the
evaluation queue gets busy, authors of multiple submissions may be given a lower priority
on the queue, or in extreme cases have jobs deleted from the queue.
- The PO Digits dataset (the set on which the accuracy is measured) has
a message digest (SHA-1) = C4 16 04 73 A0 E4 DA 66 69 AA 48 8A D7 D0 FF 77 EE
DB 20 5E
- In order to qualify for the prize, in addition to meeting all the
criteria listed below, the algorithm must also demonstrate test set accuracy that is
statistically significantly (see test details below) better than the baseline algorithm,
called sml.NearestNeighbour.
The order of winning criteria will be the algorithm that has, on
average:
- highest test accuracy on the PO Digits dataset (*see below)
- fastest recognition time,
- fastest train time
- fastest load time,
- fastest save time,
- earliest submitted.
These criteria are defined on the TrainableOCR
problem definition page.
*Highest test accuracy: The
average will be computed over a number of training set / test set pairs. The
difference between the means must be significant according to an unpaired two-tailed
t-test at the 95% significance level. If the accuracy difference does not pass this
test, then it goes to the next criterion in the above list.
The SHA-1 (Secure Hash Algorithm) Message Digest for the dataset
is given above. After the competition, we will then publish the dataset - together
with the code for organising the generation of the Message Digest.
Since there is no known practical way of producing a desired
message digest by editing a piece of text, this is proof that we did not tamper with the
dataset during the competition. (We borrowed this idea from a similar method used in
the Abbadingo Grammatical Inference contest).
The algorithms must satisfy the following hard constraints:
- (Code size (class files) + Trained size) < 50,000 bytes,
- Train time (for 500 characters) < 2 minutes,
- Recognition time (per character) < 250
milliseconds,
- Load time (time to load a pre-trained classifier) < 10 seconds.
- Save time (time to save a trained classifier) < 60 seconds
- Each evaluation job (training plus testing) will be allocated three
minutes of CPU time. Jobs that exceed this budget will be terminated.
All submissions must be in Java. Submission is directly by
source code or via upload of a Java class-file archive. See the upload page for more details.
We will aim to keep any source code or class files submitted
private. However, no site is completely hack-proof, and we accept no responsibility
in the case of other parties obtaining your submissions. We shall not use your code
for any other purpose than that of measuring its performance on various datasets unless
given your explicit permission to do otherwise. We do not guarantee that all the
datasets will consist of OCR data. We may test the algorithms on other kinds of
images for fun!
|