[OpenCV] detectMultiScale: output detection score

OpenCV provides quite decent implementation of the Viola-Jones Face detector.

A quick example looks like this (OpenCV 2.4.5 tested):

// File: main.cc
#include 

using namespace cv;

int main(int argc, char **argv) {

    CascadeClassifier cascade;
    const float scale_factor(1.2f);
    const int min_neighbors(3);

    if (cascade.load("./lbpcascade_frontalface.xml")) {

        for (int i = 1; i < argc; i++) {

            Mat img = imread(argv[i], CV_LOAD_IMAGE_GRAYSCALE);
            equalizeHist(img, img);
            vector objs;
            cascade.detectMultiScale(img, objs, scale_factor, min_neighbors);

            Mat img_color = imread(argv[i], CV_LOAD_IMAGE_COLOR);
            for (int n = 0; n < objs.size(); n++) {
                rectangle(img_color, objs[n], Scalar(255,0,0), 8);
            }
            imshow("VJ Face Detector", img_color);
            waitKey(0);
        }
    }

    return 0;
}
g++ -std=c++0x -I/usr/local/include `pkg-config --libs opencv` main.cc -o main

The detection results are as shown below:
result

For more serious user, it would be nice to have a detection result for each detected face.
The OpenCV provides a overloaded function designed for this usage which is lack of detailed documentation:

vector reject_levels;
vector level_weights;
cascade.detectMultiScale(img, objs, reject_levels, level_weights, scale_factor, min_neighbors);

The reject_levels and level_weights will keep being empty until you write it like this (The whole file):

// File: main.cc
#include 

using namespace cv;

int main(int argc, char **argv) {

    CascadeClassifier cascade;
    const float scale_factor(1.2f);
    const int min_neighbors(3);

    if (cascade.load("./lbpcascade_frontalface.xml")) {

        for (int i = 1; i < argc; i++) {

            Mat img = imread(argv[i], CV_LOAD_IMAGE_GRAYSCALE);
            equalizeHist(img, img);
            vector objs;
            vector reject_levels;
            vector level_weights;
            cascade.detectMultiScale(img, objs, reject_levels, level_weights, scale_factor, min_neighbors, 0, Size(), Size(), true);

            Mat img_color = imread(argv[i], CV_LOAD_IMAGE_COLOR);
            for (int n = 0; n < objs.size(); n++) {
                rectangle(img_color, objs[n], Scalar(255,0,0), 8);
                putText(img_color, std::to_string(level_weights[n]),
                        Point(objs[n].x, objs[n].y), 1, 1, Scalar(0,0,255));
            }
            imshow("VJ Face Detector", img_color);
            waitKey(0);
        }
    }

    return 0;
}

However, this will give you a large number of detected rectangles:
result-org

This is because OpenCV skips the step of filtering out the overlapped small rectangles. I have no idea whether this is by design. But output likes this would not be helpful at least in my own case.

So we would need to make our own changes in the OpenCV's source code.
There are different ways to design detection score, such as
"In the OpenCV implementation, stage_sum is computed and compared against the i stage_threshold for each stage to accept/reject a candidate window. We define the detection score for a candidate window as K*stage_when_rejected + stage_sum_for_stage_when_rejected. If a window is accepted by the cascade, we just K*last_stage + stage_sum_for_last_stage. Choosing K as a large value e.g., 1000, we ensure that windows rejected at stage i have higher score than those rejected at stage i-1." from http://vis-www.cs.umass.edu/fddb/faq.html

Actually, I found a straightforward design of detection score works well in my own work. In the last stage of the face detector in OpenCV, detection rectangles are grouped into clustered to eliminated small overlapped rectangles while keeping the most potential rectangles. The number of final detected faces is at most same as the number of clusters. So we can simply use the number of rectangles grouped into the cluster as the detection score of the associated final rectangle, which may not be accurate but could work.

To make this change, in OpenCV-2.4.5, find the file modules/objdetect/src/cascadedetect.cpp (line 200)

// modules/objdetect/src/cascadedetect.cpp (line 200)
// int n1 = levelWeights ? rejectLevels[i] : rweights[i]; //< comment out this line
int n1 = rweights[i]; //< the change

We then modify the file main.cc accordingly:

// File: main.cc
#include 

using namespace cv;

int main(int argc, char **argv) {

    CascadeClassifier cascade;
    const float scale_factor(1.2f);
    const int min_neighbors(3);

    if (cascade.load("./lbpcascade_frontalface.xml")) {

        for (int i = 1; i < argc; i++) {

            Mat img = imread(argv[i], CV_LOAD_IMAGE_GRAYSCALE);
            equalizeHist(img, img);
            vector objs;
            vector reject_levels;
            vector level_weights;
            cascade.detectMultiScale(img, objs, reject_levels, level_weights, scale_factor, min_neighbors, 0, Size(), Size(), true);

            Mat img_color = imread(argv[i], CV_LOAD_IMAGE_COLOR);
            for (int n = 0; n < objs.size(); n++) {
                rectangle(img_color, objs[n], Scalar(255,0,0), 8);
                putText(img_color, std::to_string(reject_levels[n]),
                        Point(objs[n].x, objs[n].y), 1, 1, Scalar(0,0,255));
            }
            imshow("VJ Face Detector", img_color);
            waitKey(0);
        }
    }

    return 0;
}

And we can have the detection scores like this:
result-final

On 2 dimensional array of C++

I was asked about this today. In practice, I rarely use 2-dimensional array, instead I use vector of vectors.

To allocate a 2-d array on the stack, a C-style array is

int d[2][3];

Then to refer to an element it is like

d[i][j];

To make a dynamical allocation, one can NOT write his code like this

int **wrong_d = new int[2][3];

since the d in int d[2][3]; is not a int**, instead it is of the type

int (*)[3]

or in your evil human words, it is a int[3] pointer.

It is a little tricky to declare a 2-d array dynamically.

int (*d)[3] = new int[2][3];

Or

int v1 = 2;
int (*d)[3] = new int[v1][3];

The number 3 here can NOT be replace by a non-constant. My understanding is that since this value is associated with the type of d, in a strong type language like C/C++ which should be known by the compiler.

We can check the size of variables to verify this interpretation.

 1 #include 
  2 
  3 using namespace std;
  4 
  5 int main(int argc, char **argv)
  6 {
  7     int v1 = 2;
  8     int (*d)[3] = new int[v1][3];
  9     cout << "sizeof(d): " << sizeof(d) << endl;
 10     cout << "sizeof(d[0]): " << sizeof(d[0]) << endl;
 11     cout << "sizeof(d[1]): " << sizeof(d[1]) << endl;
 12     cout << "sizeof(d[0][0]): " << sizeof(d[0][0]) << endl;
 13     return 0;
 14 }

The output is:

$./test 
sizeof(d): 8        //< 2 pointers, &d[0] and &d[1]
sizeof(d[0]): 12    //< int[3], d[0][0] d[0][1] d[0][2]
sizeof(d[1]): 12    //< int[3], d[1][0] d[1][1] d[1][2]
sizeof(d[0][0]): 4  //< an integer

To save your life, I would recommend to use vectors.

int v1 = 2;
int v2 = 3;
vector > d(v1, vector(v2, 0)); 

[OpenCV]detectMultiScale

I met a problem when using the interface ‘detectMultiScale’ of OpenCV. The rectangles it gives out may not be fully inside the frame of the original image. As a result, if these rectangles are applied directly on the original image to crop out the detected objects, your programs crash.

These are the interfaces

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size() );

and

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               vector& rejectLevels,
                               vector& levelWeights,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size(),
                               bool outputRejectLevels=false );

I am copying what I wrote for a pull request on Github, this ‘may-be issue’ can be fixed easily by modifying one line in the source file

modules/objdetect/src/cascadedetect.cpp

Replace this one

Size processingRectSize( scaledImageSize.width - originalWindowSize.width + 1, scaledImageSize.height - originalWindowSize.height + 1 );

with this line

Size processingRectSize( scaledImageSize.width - originalWindowSize.width , scaledImageSize.height - originalWindowSize.height);

My explanation goes here, ignore the line numbers if they look wrong to you.
“Actually, in the code, the workflow is more complicated. In the file cascadedetect.cpp

This is the line building the final detected rectangle

995 rectangles->push_back(Rect(cvRound(x*scalingFactor), cvRound(y*scalingFactor), winSize.width, winSize.height));

the winSize is assigned here

969 Size winSize(cvRound(classifier->data.origWinSize.width * scalingFactor), cvRound(classifier->data.origWinSize.height * scalingFactor));

while the maximum value of x and y can be find here, they are related to the processingRectSize.

971         int y1 = range.start * stripSize;
972         int y2 = min(range.end * stripSize, processingRectSize.height);
973         for( int y = y1; y < y2; y += yStep )
974         {
975             for( int x = 0; x < processingRectSize.width; x += yStep )

Say the original image size is O, the original window size is W, scaling factor is F. O and W is integer and F is a decimal usually larger than 1. The width and height are assumed to be the same for example.

If we calculate the right-most point of the detected rectangle, it should be:

the maximum x is: (cvRound(O/F) - W), current winSize is W*F, following the line 995 we get:

cvRound( (cvRound(O/F) - W) * F ) + W*F

This can be larger than O, say O is 600, F is 4.177250, W is 24, the number we can get above is 601.254 which is larger than 600."

Hope these help.

SSE2 Vector Operation by Vlfeat

If you are writing something involving the math between vectors in C/C++, you may want to check out Vlfeat (http://vlfeat.org).

It is designed to be a library for Computer Vision related stuff, but it also bring you a wrapper for SSE2 acceleration for vector computation.

Say your original code for the calculation of vectors product looks like this:

float productOfVectors(const float *vecA, const float *vecB, const int dimension) {
   float value = 0.0f;
   for (int i = 0; i < dimension; i++)
   {
        value += (vecA[i] * vecB[i]);
   }
   return value;
}

It can save you time significantly by adding vlfeat to your project and replace it with this:

float productOfVectors(const float *vecA, const float *vecB, const int dimension) {
   float value = 0.0f;
   vl_eval_vector_comparison_on_all_pairs_f(&value, dimension, vecA, 1, vecB, 1, vl_get_vector_comparison_function_f(VlKernelL2));
   return value;
}

It's pretty easy but it really works. It takes use of the SSE2 instructions provided by your CPU which result in an non-trivial acceleration when you are doing large scale computation.

You can find more supported forms of calculation here, thanks for the developer's good job.

Setup dropbox without X

Assume you are accessing your server remotely through SSH.

1. Download the package, say you are using Ubuntu

$wget https://www.dropbox.com/download?dl=packages/ubuntu/dropbox_1.4.0_amd64.deb

2. Install the package

$sudo dpkg -i *dropbox_1.4.0_amd64.deb

3. Start Dropbox

$dropbox start

4. You will get an url to link your computer here, copy-and-paste into your web-browser. Restart your dropbox, all set :]

$dropbox stop
$dropbox start

I tried to open the url with w3m or lynx inside the terminal but failed.
It seems they don’t support Javascript well.

fstream issue under 64-bits cygwin

This is a question I posted on StackOverflow: fstream issue under 64-bits cygwin, glad to get help from Nemo on Stackflow, all credits goes to him.

The problem is when I use 64-bits g++ to compile the same piece of code, I get unexpected different result.

The source code looks like this:

#include 
#include 
using namespace std;

int main()
{
    int rows = 200;
    int cols = 200;
    float data[rows*cols];
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < cols; j++)
        {
            data[i*cols+j] = i*cols+j;
        }
    }
    const char *file = "tmp.txt";
    ofstream fs(file);
    if (fs.is_open())
    {
        fs.write((char*)&rows, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)&cols, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)data, sizeof(float)*rows*cols);
        cout << fs.tellp() << endl;
        fs.close();
    }
    return 0;
}

I am writing two integers and a block of float values into a binary file. It prints out how many bytes it wrote.

The expected result is:

4
8
160008

All the actions were performed under Cygwin. When the code was compiled with g++.exe, the result is right.

But when I use x86_64-w64-mingw32-g++.exe (only by which can generate 64-bits binary), the result is wired.

4
8
160506

It is wired.

According to Nemo's answer, this is because by default fstream will be opened in binary mode under *nix. This also holds for 32 bits g++ under Cygwin, but not for 64 bits cygwin g++.

It leads to an unexpected behavior, fstream will replace some special bytes, say 'newline' in the binary data, with different special bytes, say 'unix-style newline'.

To solve this problem, replace the line

ofstream fs(file);

with

ofstream fs(file,ios_base::binary);

We can alter the content of the float data array to see what will happen.

#include 
#include 
using namespace std;

int main()
{
    int rows = 200;
    int cols = 200;
    float data[rows*cols];
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < cols; j++)
        {
            data[i*cols+j] = 0;
        }
    }
    const char *file = "tmp.txt";
    ofstream fs(file);
    if (fs.is_open())
    {
        fs.write((char*)&rows, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)&cols, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)data, sizeof(float)*rows*cols);
        cout << fs.tellp() << endl;
        fs.close();
    }
    return 0;
}

The output is:

4
8
160208

It somehow supports the above explanation.

Use a different delimiter in sed

Think about this, you want to replace “XXX” with a path like “/path/to/YYY” in a file.
Your file looks like this:

XXX
AAA
XXX

Your bash script looks like this:

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e '{s/$VAR/$NEW_PATH/g}' your_file

Well, it won’t work, since single quotes ‘ will force bash to keep variables as-is.

Ok, we try to use double quotes ”

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e "{s/$VAR/$NEW_PATH/g}" your_file

This one won’t work either.
It is ok for sed to not have the single quotes, the problem is
you have slash in your NEW_PATH variable, which is the default delimiter of sed.

This simple stuff took me about an hour :[

The solution is just use a different delimiter in sed.

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e "{s@$VAR@$NEW_PATH@g}" your_file

Sometimes, you still want to use single quotes to keep your special symbols, such as ‘(‘ ‘)’
It is ok for your to mix quotes:

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e '{s@\(.*\)@'"$NEW_PATH"'@g}' your_file

Project-Specific Vim Configuration

Usually, we have our own vim configuration in the file ~/.vimrc
What if we want one for our project?
For example, we are working on certain project which needs to read in a tags file.
Well, here is a possible solution.

You put a line at then end of your ~/.vimrc file

source ./.project.vim

Then for each project, put your project-specific stuff in a the file “.project.vim”.

If you don’t want the error message says vim cannot find the file “.project.vim”,
wrap the line above with an if statement:

if filereadable("./.project.vim")
    source ./.project.vim
endif

:]

tail -n

Quite useful trick, when you want to get rid of the first certain lines of the output/file.

$tail -n +2

tail prints out the result starts from the 2nd line.

When you use the “find” command to generate a file list under certain folder, this is quite easy to eliminate the folder path at the beginning of the output.

Learned this from SO How can I remove the first line of a text file using bash/sed script?

Draw ROC Curve

A piece of fairly simple Matlab script to draw the ROC Curve from an array of scores and an array of labels.

function [Tps, Fps] = ROC(scores, labels)
 
%% Sort Labels and Scores by Scores
sl = [scores; labels];
[d1 d2] = sort(sl(1,:));
 
sorted_sl = sl(:,d2);
s_scores = sorted_sl(1,:);
s_labels = round(sorted_sl(2,:));
 
%% Constants
counts = histc(s_labels, unique(s_labels));
 
Tps = zeros(1, size(s_labels,2) + 1);
Fps = zeros(1,  size(s_labels,2) + 1);
 
negCount = counts(1);
posCount = counts(2);
 
%% Shift threshold to find the ROC
for thresIdx = 1:(size(s_scores,2)+1)
 
    % for each Threshold Index
    tpCount = 0;
    fpCount = 0;
 
    for i = [1:size(s_scores,2)]
 
        if (i >= thresIdx)           % We think it is positive
            if (s_labels(i) == 1)   % Right!
                tpCount = tpCount + 1;
            else                    % Wrong!
                fpCount = fpCount + 1;
            end
        end
 
    end
 
    Tps(thresIdx) = tpCount/posCount;
    Fps(thresIdx) = fpCount/negCount;
 
end
 
%% Draw the Curve

% Sort [Tps;Fps]
x = Tps;
y = Fps;

% Interplotion to draw spline line
count = 100;
dist = (x(1) - x(size(x,2)))/100;
xx = [x(1):-dist:x(size(x,2))];

% In order to get the interpolations, we remove all the unique numbers
[d1 d2] = unique(x);
uni_x = x(1,d2);
uni_y = y(1,d2);
yy = spline(uni_x,uni_y,xx);

% No value should exceed 1
yy = min(yy, 1);

plot(x,y,'x',xx,yy);

Hope it helps.


Some improvements were added.

For a sample input:

>> scores = rand(1,20)*100

scores =

  Columns 1 through 7

   43.8744   38.1558   76.5517   79.5200   18.6873   48.9764   44.5586

  Columns 8 through 14

   64.6313   70.9365   75.4687   27.6025   67.9703   65.5098   16.2612

  Columns 15 through 20

   11.8998   49.8364   95.9744   34.0386   58.5268   22.3812

>> labels = round(rand(1,20))

labels =

  Columns 1 through 12

     1     0     1     1     1     1     1     0     0     0     1     0

  Columns 13 through 20

     1     0     1     0     0     0     1     0

>> ROC(scores,labels);

Gives an output like:
ROC

Fork it on Github: DrawROC on Github