================================= The standard IAPR_TC5 partitions ================================= Contents: ------------------------------------- README ------------------------------------- This file ------------------------------------- generate.sh ------------------------------------- Script to generate the partitions. The standard partitions are just provided. ------------------------------------- nfold.c ------------------------------------- Usage: nfold data partitions [seed] Program to split the data into n partitions according to the class prior probabilities. -------------------------------------- Datasets -------------------------------------- Each directory contains a different task. In each directory we find the original data set file, named "data", usually from UCI or Statlog corpora, and the standard partitions generated by the generate.sh script. The format of the data file is quite simple: Each row is a feature vector sample with the class label at the end of the row. The number of rows is the number of samples of the task. Additional information about number of classes, number of samples or dimension of the feature vectors is not necessary. This data file is the original data file but with the categorical features replaced by r binary features, where r is the range of the categorical feature. From this data file 5 partitions are obtained with 10 different random seeds: 123,234,345,456,567,678,789,890,901 and 120. The shell-script file to generate the partitions is generate.sh. The script require the name of the task directory as parameter. The files obtained are: tra_XX_YY tst_XX_YY Where XX is the number of each 5-partition from 0 to 9. Where YY is the partition from 0 to 4. To obtain the error rate we should train with tra_XX_YY and obtain the estimated error with the tst_XX_YY. The complete experiment is: Training: tra_0_0 Test: tst_0_0 Training: tra_0_1 Test: tst_0_1 Training: tra_0_2 Test: tst_0_2 Training: tra_0_3 Test: tst_0_3 Training: tra_0_4 Test: tst_0_4 Training: tra_1_0 Test: tst_1_0 Training: tra_1_1 Test: tst_1_1 Training: tra_1_2 Test: tst_1_2 Training: tra_1_3 Test: tst_1_3 Training: tra_1_4 Test: tst_1_4 . . . Training: tra_9_4 Test: tst_9_4 A total number of 50 different error estimations. The estimated error would be the average of this 50 experiments. Author: Roberto Paredes rparedes@dsic.upv.es