Announce

PukiWiki contents have been moved into SONOTS Plugin (20070703)

Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features)

Table of Contents

Objective

The OpenCV library provides us a greatly interesting demonstration for a face detection. Furthermore, it provides us programs (or functions) that they used to train classifiers for their face detection system, called HaarTraining, so that we can create our own object classifiers using these functions. It is interesting.

However, I could not follow how OpenCV developers performed the haartraining for their face detection system exactly because they did not provide us several information such as what images and parameters they used for training. The objective of this report is to provide step-by-step procedures for following people.

My working environment is Visual Studio + cygwin on Windows XP, or on Linux. The cygwin is required because I use several UNIX commands. I am sure that you will use the cygwin (especially I mean UNIX commands) not only for this haartraining but also for others in the future if you are one of engineer or science people.

FYI: I recommend you to work haartrainig with something different concurrently because you have to wait so many days during training (it would possibly take one week). I typically experimented as 1. run haartraining on Friday 2. forget about it completely 3. see results on next Friday 4. run another haartraining (loop).

figure_3.gif
A picture from the OpenCV website

History

  • 10/16/2008 - Additional experimental results.
  • 08/28/2008 - Revised entirely.
  • 06/05/2007 - opencv-1.0.0
  • 03/12/2006 - First Edition (opencv-0.9.7)

Tag: SciSoftware ComputerVision FaceDetection OpenCV

Data Prepartion

FYI: There are database lists on Face Recognition Homepage - Databases. and Computer Vision Test Images.

Positive (Face) Images

We need to collect positive images that contain only objects of interest, e.g., faces.

Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns, and 5000 positive frontal face patterns were derived from 1000 original faces. I describe how to increase number of samples at the later chapter.

Before, I downloaded and used The UMIST Face Database (Dead Link) because cropped face images were available at there. The UMIST Face Database has video-like image sequences from side-faces to frontal faces. I thought training with such images would generate a face detector which is robust to facial pose. However, the generated face detector did not work well. Probably, I dreamed too much. It was a story on 2006.

I obtained a cropped frontal face database based on CMU PIE Database. I use it too. This dataset has a large illumination variations, thus this would result in the same bad result with the case of the UMIST Face Database which had large variations in poses.
#Sorry, it looks redistribution (of modifications) of PIE database is not allowed. I made only a generated (distorted and diminished) .vec file available at the Download section. The PIE database is free (send a request e-mail), but it does not include the cropped faces originally.

MIT CBCL Face Data is another choice. They have 2,429 frontal faces with few illumination variations and pose variations. This data would be good for haartraining. However, the size of image is originally small 19 x 19. So, we can not perform experiments to determine good sizes.

Probably, the OpenCV developers used the FERET database. It looks that the FERET database became available to download over internet from Jan. 31, 2008(?).

Negative (Background) Images

We need to collect negative images that does not contain objects of interest, e.g., faces to train haarcascade classifier.

Kuranov et. al. [3] states as they used 3000 negative images.

Fortunately, I found http://face.urtho.net/ (Negatives sets, Set 1 - Various negatives) which has about 3500 images (Dead Link). But, this collection was used for eye detection, and includes some faces in some pictures. Therefore, I deleted all suspicious images which looked including faces. About 2900 images were remained, and I added 100 images to there. The number should be enough.

The collection is available at the Download section (But, it may take forever to download.)

Natural Test (Face in Background) Images

We can synthesize testing image sets using the createsamples utility, but having a natural testing image dataset is still good.

There is a CMU-MIT Frontal Face Test Set that the OpenCV developers used for their experiments. This dataset has a ground truth text including information for locations of eyes, noses, and lip centers and tips, however, it does not have locations of faces expressed by rectangle regions required by the haartraining utilities as default.

I created a simple script to compute facial regions from given ground truth information. My computation works as follows:

1. Get margin as nose height - mouse height
Lower boundary is located below the margin from the mouse
Upper boundary is located above the margin from the eye
2. Get margin as left mouse tip - right mouse tip
Right boundary is located right the margin from the right eye
Left boundary is located left the margin from the left eye

This was not perfect, but looked okay.

The generated ground truth text and image dataset is available at the Download section, you may download only the ground truth text. By the way, I converted GIF to PNG because OpenCV does not support GIF. The mogrify (ImageMagick) command would be useful to do such conversion of image types

$ mogrify -format png *.gif

How to Crop Images Manually Fast

To collect positive images, you may have to crop images a lot by your hand.

I created a multi-platform software imageclipper to help to do it. This software is not only for haartraining but also for other computer vision/machine learning researches. This software has characteristics as follows:

  • You can open images in a directory sequentially
  • You can open a video file too, frame by frame
  • Clipping and moving to the next image can be done by one button (SPACE)
  • You will select a region to clip by dragging left mouse button
  • You can move or resize your selected region by dragging right mouse button
  • Your selected region is shown on the next image too.

Create Samples (Reference)

We can create training samples and testing samples with the createsamples utility. In this section, I describe functionalities of the createsamples software because the Tutorial [1] did not explain them clearly for me (but please see the Tutorial [1] also for further options).

This is a list of options, but there are mainly four functions and the meanings of options become different in different functions. It confuses us.

Usage: ./createsamples
  [-info <description_file_name>]
  [-img <image_file_name>]
  [-vec <vec_file_name>]
  [-bg <background_file_name>]
  [-num <number_of_samples = 1000>]
  [-bgcolor <background_color = 0>]
  [-inv] [-randinv] [-bgthresh <background_color_threshold = 80>]
  [-maxidev <max_intensity_deviation = 40>]
  [-maxxangle <max_x_rotation_angle = 1.100000>]
  [-maxyangle <max_y_rotation_angle = 1.100000>]
  [-maxzangle <max_z_rotation_angle = 0.500000>]
  [-show [<scale = 4.000000>]]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]

1. Create training samples from one

The 1st function of the createsamples utility is to create training samples from one image applying distortions. This function (cvhaartraining.cpp#cvCreateTrainingSamples) is launched when options, -img, -bg, and -vec were specified.

  • -img <one_positive_image>
  • -bg <collection_file_of_negatives>
  • -vec <name_of_the_output_file_containing_the_generated_samples>

For example,

$ createsamples -img face.png -num 10 -bg negatives.dat -vec samples.vec -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 20 -h 20

This generates <num> number of samples from one <positive_image> applying distortions. Be careful that only the first <num> negative images in the <collection_file_of_negatives> are used.

The file of the <collection_file_of_negatives> is as follows:

[filename]
[filename]
[filename]
...

such as

img/img1.jpg
img/img2.jpg

Let me call this file format as collection file format.

How to create a collection file

This format can easily be created with the find command as

$ cd [your working directory]
$ find [image dir] -name '*.[image ext]' > [description file]

such as

$ find ../../data/negatives/ -name '*.jpg' > negatives.dat

2. Create training samples from some

The 2nd function is to create training samples from some images without applying distortions. This function (cvhaartraining.cpp#cvCreateTestSamples) is launched when options, -info, and -vec were specified.

  • -info <description_file_of_samples>
  • -vec <name_of_the_output_file_containing_the_generated_samples>

For example,

$ createsamples -info samples.dat -vec samples.vec -w 20 -h 20

This generates samples without applying distortions. You may think this function as a file format conversion function.

The format of the <description_file_of_samples> is as follows:

[filename] [# of objects] [[x y width height] [... 2nd object] ...]
[filename] [# of objects] [[x y width height] [... 2nd object] ...]
[filename] [# of objects] [[x y width height] [... 2nd object] ...]
...

where (x,y) is the left-upper corner of the object where the origin (0,0) is the left-upper corner of the image such as

img/img1.jpg 1 140 100 45 45
img/img2.jpg 2 100 200 50 50 50 30 25 25
img/img3.jpg 1 0 0 20 20

Let me call this format as a description file format against the collection file format although the manual [1] does not differentiate them.

This function crops regions specified and resize these images and convert into .vec format, but (let me say again) this function does not generate many samples from one image (one cropped image) applying distortions. Therefore, you may use this 2nd function only when you have already sufficient number of natural images and their ground truths (totally, 5000 or 7000 would be required).

Note that the option -num is used only to restrict the number of samples to generate, not to increase number of samples applying distortions in this case.

How to create a description file

I write how to create a description file when already-cropped image files are available here because some people had asked how to create it at the OpenCV forum. Note that my tutorial steps do not require to perform this.

For such a situation, you can use the find command and the identify command (cygwin should have identify (ImageMagick) command) to create a description file as

$ cd <your working directory>
$ find <dir> -name '*.<ext>' -exec identify -format '%i 1 0 0 %w %h' \{\} \; > <description_file>

such as

$ find ../../data/umist_cropped -name '*.pgm' -exec identify -format '%i 1 0 0 %w %h' \{\} \; > samplesdescription.dat

If all images have the same size, it becomes simpler and faster,

$ find <dir> -name '*.<ext>' -exec echo \{\} 1 0 0 <width> <height> \; > <description_file>

such as

$ find ../../data/umist_cropped -name '*.pgm' -exec echo \{\} 1 0 0 20 20 \; > samplesdescription.dat

How to automate to crop images? If you can do it, you do not need haartraining. You have an object detector already (^-^

3. Create test samples

The 3rd function is to create test samples and their ground truth from single image applying distortions. This function (cvsamples.cpp#cvCreateTrainingSamplesFromInfo) is triggered when options, -img, -bg, and -info were specified.

  • -img <one_positive_image>
  • -bg <collection_file_of_negatives>
  • -info <generated_description_file_for_the_generated_test_images>

In this case, -w and -h are used to determine the minimal size of positives to be embeded in the test images.

$ createsamples -img face.png -num 10 -bg negatives.dat -info test.dat -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0

Be careful that only the first <num> negative images in the <collection_file_of_negatives> are used.

This generates tons of jpg files such as

0001_0351_0227_0115_0115.jpg

The output image filename format is as <number>_<x>_<y>_<width>_<height>.jpg, where x, y, width and height are the coordinates of placed object bounding rectangle.

Also, this generates <description_file_for_test_samples> of the description file format (the same format with <description_file_of_samples> at the 2nd function).

4. Show images

The 4th function is to show images within a vec file. This function (cvsamples.cpp#cvShowVecSamples) is triggered when only an option, -vec, was specified (no -info, -img, -bg). For example,

$ createsamples -vec samples.vec -w 20 -h 20

EXTRA: random seed

The createsamples software applys the same sequence of distortions for each image. We may want to apply the different sequence of distortions for each image because, otherwise, our resulting detection may work only for specific distortions.

This can be done by modifying createsamples slightly as:

Add below in the top

#include<time.h>

Add below in the main function

srand(time(NULL));

The modified source code is available at svn:createsamples.cpp

Create Samples

Create Training Samples

Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns and 3000 negatives for training, and 5000 positive frontal face patterns were derived from 1000 original faces.

However, you may have noticed that none of 4 functions of the createsamples utility provide us a function to generate 5000 positive images from 1000 images at burst. We have to use the 1st function of the createsamples to generate 5 (or some) positives form 1 image, repeat the procedures 1000 (or some) times, and finally merge the generated output vec files. *1

I wrote a program, mergevec.cpp, to merge vec files. I also wrote a script, createtrainsamples.pl, to repeat the procedures 1000 (or some) times. I specified 7000 instead of 5000 as default because the Tutorial [1] states as "the reasonable number of positive samples is 7000." Please modify the path to createsamples and its option parameters directly written in the file.

The input format of createtrainsamples.pl is

$ perl createtrainsamples.pl <positives.dat> <negatives.dat> <vec_output_dir> [<totalnum = 7000>] [<createsample_command_options = "./createsamples -w 20 -h 20...">]

And, the input format of mergevec is

$ mergevec <collection_file_of_vecs> <output_vec_file_name>

A collection file (a file containing list of filenames) can be generated as

$ find [dir_name] -name '*.[ext]' > [collection_file_name]

Example)

$ cd HaarTraining/bin 
$ find ../../data/negatives/ -name '*.jpg' > negatives.dat
$ find ../../data/umist_cropped/ -name '*.pgm' > positives.dat

$ perl createtrainsamples.pl positives.dat negatives.dat samples 7000 "./createsamples  -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 20 -h 20"
$ find samples/ -name '*.vec' > samples.dat # to create a collection file for vec files
$ mergevec samples.dat samples.vec
$ # createsamples -vec samples.vec -show -w 20 -h 20 # Extra: If you want to see inside

Kuranov et. al. [3] states as 20x20 of sample size achieved the highest hit rate. Furthermore, they states as "For 18x18 four split nodes performed best, while for 20x20 two nodes were slightly better. Thus, -w 20 -h 20 would be good.

Create Testing Samples

Testing samples are images which include positives in negative background images and locations of positives are known in the images. It is possible to create such testing images by hand. We can also use the 3rd function of createsamples to synthesize such images. But, we can specify only one image using it, thus, creating a script to repeat the procedure would help us. The script is available at svn:createtestsamples.pl. Please modify the path to createsamples and its option parameters directly in the file.

The input format of the createtestsamples.pl is as

$ perl createtestsamples.pl <positives.dat> <negatives.dat> <output_dir> [<totalnum = 1000>] [<createsample_command_options = "./createsamples -w 20 -h 20...">]

This generates lots of jpg files and info.dat in the <output_dir>. The jpg file name format is as <number>_<x>_<y>_<width>_<height>.jpg, where x, y, width and height are the coordinates of placed object bounding rectangle.

Example)

$ # cd HaarTraining/bin 
$ # find ../../data/negatives/ -name '*.jpg' > negatives.dat 
$ # find ../../data/umist_cropped/ -name '*.pgm' > positives.dat
$ perl createtestsamples.pl positives.dat negatives.dat tests 1000 "./createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 -maxzangle 0.5 maxidev 40"
$ find tests/ -name 'info.dat' -exec cat \{\} \; > tests.dat # merge info files

Training

Haar Training

Now, we train our own classifier using the haartraining utility. Here is the usage of the haartraining.

Usage: ./haartraining
  -data <dir_name>
  -vec <vec_file_name>
  -bg <background_file_name>
  [-npos <number_of_positive_samples = 2000>]
  [-nneg <number_of_negative_samples = 2000>]
  [-nstages <number_of_stages = 14>]
  [-nsplits <number_of_splits = 1>]
  [-mem <memory_in_MB = 200>]
  [-sym (default)] [-nonsym]
  [-minhitrate <min_hit_rate = 0.995000>]
  [-maxfalsealarm <max_false_alarm_rate = 0.500000>]
  [-weighttrimming <weight_trimming = 0.950000>]
  [-eqw]
  [-mode <BASIC (default) | CORE | ALL>]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]
  [-bt <DAB | RAB | LB | GAB (default)>]
  [-err <misclass (default) | gini | entropy>]
  [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
  [-minpos <min_number_of_positive_samples_per_cluster = 500>]

Kuranov et. al. [3] states as 20x20 of sample size achieved the highest hit rate. Furthermore, they states as "For 18x18 four split nodes performed best, while for 20x20 two nodes were slightly better. The difference between weak tree classifiers with 2, 3 or 4 split nodes is smaller than their superiority with respect to stumps."

Furthermore, there was a description as "20 stages were trained. Assuming that my test set is representative for the learning task, I can expect a false alarm rate about 0.5^{20} \approx 9.6e-07 and a hit rate about 0.999^{20} \approx 0.98."

Therefore, use of 20x20 of sample size with nsplit = 2, nstages = 20, minhitrate = 0.9999 (default: 0.995), maxfalselarm = 0.5 (default: 0.5), and weighttrimming = 0.95 (default: 0.95) would be good such as

$ haartraining -data haarcascade -vec samples.vec -bg negatives.dat -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 7000 -nneg 3019 -w 20 -h 20 -nonsym -mem 512 -mode ALL

The "-nonsym" option is used when the object class does not have vertical (left-right) symmetry. If object class has vertical symmetry such as frontal faces, "-sym (default)" should be used. It will speed up processing because it will use only the half (the centered and either of the left-sided or the right-sided) haar-like features.

The "-mode ALL" uses Extended Sets of Haar-like Features [2]. Default is BASIC and it uses only upright features, while ALL uses the full set of upright and 45 degree rotated feature set [1].

The "-mem 512" is the available memory in MB for precalculation [1]. Default is 200MB, so increase if more memory is available. We should not specify all system RAM because this number is only for precalculation, not for all. The maximum possible number to be specified would be 2GB because there is a limit of 4GB on the 32bit CPU (2^32 ‚Čí 4GB), and it becomes 2GB on Windows (kernel reserves 1GB and windows does something more).

There are other options that [1] does not list such as

 [-bt <DAB | RAB | LB | GAB (default)>]
 [-err <misclass (default) | gini | entropy>]
 [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
 [-minpos <min_number_of_positive_samples_per_cluster = 500>]

Please see my modified version of haartraining document [5] for details.

#Even if you increase the number of stages, the training may finish in an intermediate stage when it exceeded your desired minimum hit rate or false alarm because more cascading will decrease these rate for sure (0.99 until current * 0.99 next = 0.9801 until next). Or, the training may finish because all samples were rejected. In the case, you must increase number of training samples.

#You can use OpenMP (multi-processing) with compilers such as Intel C++ compiler and MS Visual Studio 2005 Professional Edition or better. See How to enable OpenMP section.

#One training took three days.

Generate a XML File

The haartraing generates a xml file when the process is completely finished (from OpenCV beta5).

If you want to convert an intermediate haartraining output dir tree data into a xml file, there is a software at the OpenCV/samples/c/convert_cascade.c (that is, in your installation directory). Compile it.

The input format is as

$ convert_cascade --size="<sample_width>x<sampe_height>" <haartraining_ouput_dir> <ouput_file>

Example)

$ convert_cascade --size="20x20" haarcascade haarcascade.xml

Testing

Performance Evaluation

We can evaluate the performance of the generated classifier using the performance utility. Here is the usage of the performance utility.

Usage: ./performance
  -data <classifier_directory_name>
  -info <collection_file_name>
  [-maxSizeDiff <max_size_difference = 1.500000>]
  [-maxPosDiff <max_position_difference = 0.300000>]
  [-sf <scale_factor = 1.200000>]
  [-ni]
  [-nos <number_of_stages = -1>]
  [-rs <roc_size = 40>]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]

Please see my modified version of haartraining document [5] for details of options.

I cite how the performance utility works here:

During detection, a sliding window was moved pixel by pixel over the picture at each scale. Starting with the original scale, the features were enlarged by 10% and 20%, respectively (i.e., representing a rescale factor of 1.1 and 1.2, respectively) until exceeding the size of the picture in at least one dimension. Often multiple faces are detect at near by location and scale at an actual face location. Therefore, multiple nearby detection results were merged. Receiver Operating Curves (ROCs) were constructed by varying the required number of detected faces per actual face before merging into a single detection result. During experimentation only one parameter was changed at a time. The best mode of a parameter found in an experiment was used for the subsequent experiments. [3]

Execute the performance utility as

$ performance -data haarcascade -w 20 -h 20 -info tests.dat -ni
or
$ performance -data haarcascade.xml -info tests.dat -ni

Be careful that you have to tell the size of training samples when you specify the classifier directory although the classifier xml file includes the information inside *2.

-ni option suppresses to create resulted image files of detection. As default, the performance utility creates the resulted image files of detection and stores them into directories that a prefix 'det-' is added to test image directories. When you want to use this function, you have to create destination directories beforehand by yourself. Execute next command to create destination directories

$ cat tests.dat | perl -pe 's!^(.*)/.*$!det-$1!g' | xargs mkdir -p

where tests.dat is the collection file for testing images which you created at the step of createtestsamples.pl. Now you can execute the performance utility without '-ni' option.

An output of the performance utility is as follows:

+================================+======+======+======+
|            File Name           | Hits |Missed| False|
+================================+======+======+======+
|tests/01/img01.bmp/0001_0153_005|     0|     1|     0|
+--------------------------------+------+------+------+
....
+--------------------------------+------+------+------+
|                           Total|   874|   554|    72|
+================================+======+======+======+
Number of stages: 15
Number of weak classifiers: 68
Total time: 115.000000
15
        874     72      0.612045        0.050420
        874     72      0.612045        0.050420
        360     2       0.252101        0.001401
        115     0       0.080532        0.000000
        26      0       0.018207        0.000000
        8       0       0.005602        0.000000
        4       0       0.002801        0.000000
        1       0       0.000700        0.000000
        ....

'Hits' shows the number of correct detections. 'Missed' shows the number of missed detections or false negatives (Truly there exists, but the detector missed to detect it). 'False' shows the number of false alarms or false positives (Truly there does not exist, but the detector alarmed as there exists.)

The latter table is for ROC plot. Please see my modified version of haartraining document [5] for more.

Fun with a USB camera

Fun with a USB camera or some image files with the facedetect utility.

$ facedetect --cascade=<xml_file> [filename(image or video)|camera_index]

I modified facedetect.c slightly because the facedetect utility did not work in the same manner with the performance utility. I added options to change parameters on command line. The source code is available at the Download section (or direct link facedetect.c). Now the usage is as follows:

Usage: facedetect  --cascade="<cascade_xml_path>" or -c <cascade_xml_path>
  [ -sf < scale_factor = 1.100000 > ]
  [ -mn < min_neighbors = 1 > ]
  [ -fl < flags = 0 > ]
  [ -ms < min_size = 0 0 > ]
  [ filename | camera_index = 0 ]
See also: cvHaarDetectObjects() about option parameters.

FYI: The original facedetect.c used min_neighbors = 2 although performance.cpp uses min_neighbors = 1. It affected face detection results considerably.

Experiments

PIE Expeirment 1

The PIE dataset has only frontal faces with big illumination variations. The dataset used in PIE experiments looks as follows:

img01_01.pngimg01_10.pngimg01_21.png
1st10th21st
  • List of Commands haarcascade_frontalface_pie1.sh.
    • I used -w 18 -h 20 because the original images were not square but rectangle with ratio about 18:20. I applied little distortions on this experiment.
    • The training took 3 days on Intel Xeon 2GHz with 1GB memory machine.
  • Performance Evaluation with pie_test (synthesize tests) haarcascade_frontalface_pie1.performance_pie_tests.txt
    +================================+======+======+======+
    |            File Name           | Hits |Missed| False|
    +================================+======+======+======+
    |                           Total|   847|   581|    67|
    +================================+======+======+======+
    Number of stages: 16
    Number of weak classifiers: 113
    Total time: 123.000000
    16
            847     67      0.593137        0.046919
            847     67      0.593137        0.046919
            353     2       0.247199        0.001401
            110     0       0.077031        0.000000
            15      0       0.010504        0.000000
            1       0       0.000700        0.000000
    
  • Performance evaluation with cmu_tests (natural tests) haarcascade_frontalface_pie1.performance_cmu_tests.txt
    +================================+======+======+======+
    |            File Name           | Hits |Missed| False|
    +================================+======+======+======+
    |                           Total|    20|   491|     9|
    +================================+======+======+======+
    Number of stages: 16
    Number of weak classifiers: 113
    Total time: 5.830000
    16
    	20	9	0.039139	0.017613
    	20	9	0.039139	0.017613
    	2	0	0.003914	0.000000
    

PIE Experiment 2

PIE Experiment 3

PIE Experiment 4

PIE Experiment 5

PIE Experiment 6

UMIST Experiment 1

The UMIST is a multi-view face dataset.

1a000.png1a021.png1a033.png
0th frame21st frame33rd frame

UMIST Experiment 2

CBCL Experiment 1

haarcascade_frontalface_alt2.xml

Discussion

The created detectors outperformed the opencv default xml in terms of synthesized test samples created from training samples. This shows that the training was successfully performed. However, the detector did not work well in general test samples. This might mean that the detector was over-trained or over-fitted to the specific training samples. I still don't know good parameters or training samples to generalize detectors well.

False alarm rates of all of my generated detectors were pretty low compared with the opencv default detector. I don't know which parameters are especially different. I set false alarm rate with 0.5 and this makes sense theoretically. I don't know.

Training illumination varying faces in one detector resulted in pretty poor. The generated detector became sensitive to illumination rather than robust to illumination. This detector does not detect non-illuminated normal frontal faces. This makes sense because normal frontal faces did not exist in training sets so many. Training multi-view faces in one time resulted in the same thing.

We should train different detectors for each face pose or illumination state to construct a multi-view or illumination varied face detector as Fast Multi-view Face Detection. Viola and Jones extended their work for multi-view by training 12 separated face poses detectors. To achieve rapidness, they further constructed a pose estimator by C4.5 decision tree re-using the haar-like features, they further cascaded the pose estimator and face detector (Of course, this means that if pose estimation fails, the face detection also fails).

Theory behind

The advantage of the haar-like features is the rapidness in detection phase, not accuracy. We of course can construct another face detector which achieves better accuracy using, e.g., PCA or LDA although it becomes slow in detection phase. Use such features when you do not require rapidness. PCA does not require to train AdaBoost, so training phase would quickly finish. I am pretty sure that there exist such face detection method already although I did not search (I do not search because I am sure).

Download

The files are available at http://tutorial-haartraining.googlecode.com/svn/trunk/ (old repository)

Directory Tree

  • HaarTraining haartraining
    • src Source Code, haartraining and my additional c++ source codes are at here.
    • src/Makefile Makefile for Linux, please read comments inside
    • bin Binaries for Windows are ready, my perl scripts are also at here. This directory would be a working directory.
    • make Visual Studio Project Files
  • data The collected Image Datasets
  • result Generated Files (vec and xml etc) and results

This is a svn repository, so you can download files at burst if you have a svn client (you should have it on cygwin or Linux). For example,

$ svn co http://tutorial-haartraining.googlecode.com/svn/trunk/ tutorial-haartraining

Sorry, but downloading (checkout) image datasets may take forever.... I created a zip file once, but google code repository did not allow me to upload such a big file (100MB). I recommend you to check out only the HaarTraining directory first as

$ svn co http://tutorial-haartraining.googlecode.com/svn/trunk/HaarTraining/ HaarTraining

Here, the list of my additional utilities (I put them in HaarTraining/src and HaarTraining/bin directory):

The following additional utilities can be obtained from OpenCV/samples/c in your OpenCV install directory (I also put them in HaarTraining/src directory).

How to enable OpenMP

I bundled windows binaries in the Download section, but I did not enable OpenMP (multi-processing) support. Therefore, I write how to compile the haartraining utility to use OpenMP with Visual Studio 2005 Professional Edition here based on my distribution files (The procedure should be same for the originals too, but I did not verify.)

The solution file is in HaarTraining\make\haartraining.sln. Open it.

Right click cvhaartraining project > Properties. You will see a picture as below.

Follow Configuration Properties > C/C++ > Language > Change 'OpenMP Support' to 'Yes (/openmp)' as the above picture shows. If you can not see it, probably your environment does not support OpenMP.

Build cvhaartraining only (Right click the project > Project Only > Rebuild only cvhaartraining) and do the same procedure (enable OpenMP) for haartraining project. Now, haartraining.exe should work with OpenMP.

You may use Process Explorer to verify whether it is utilizing OpenMP or not.

Run the Process Explorer > View > Show Lower Pane (Ctrl+L) > choose 'haartraining.exe' process and see the Lower Pane. If you can see two threads not one thread, it is utilizing OpenMP.

References


*1 There was a choice to modify codes for the 2nd function to apply distortions and generate many images from one image, but I chose to write scripts to repeat the 1st function because the same method can be applied for creation of test samples too.
*2 The performance utility supports both classifier directory and haarcascade xml file, in details, cvLoadHaarClassifierCascade() function supports both