[Code] Mixture-of-Gaussian C++ 实现

一个拟合高斯混合模型(GMM)的C++实现: https://github.com/CVLearner/Mixture-of-Gaussians

只依赖于Eigen,但是因为Eigen可以用到MKL的LAPACK/BLAS实现,所以间接地可以利用MKL加速。相比于OpenCV的实现,一个好处是可以用到多核。所以目前速度尚可,但还有提升空间。感觉上Eigen并没有完全利用到MKL的速度。

[Bug] g++4.6 参数顺序

遇到一个bug, 看起来像是g++-4.6的问题。

问题是这样的。这个源文件用到了OpenCV:

//< file: test.cpp
#include 

int main (int argc, char** argv) {
    cv::Mat image;
    return 0;
}

用这样一行命令编译:

g++-4.6 `pkg-config --libs opencv`  -o test.bin test.cpp

遇到了错误:

/tmp/ccs2MlQz.o: In function `cv::Mat::~Mat()':
test.cpp:(.text._ZN2cv3MatD2Ev[_ZN2cv3MatD5Ev]+0x39): undefined reference to `cv::fastFree(void*)'
/tmp/ccs2MlQz.o: In function `cv::Mat::release()':
test.cpp:(.text._ZN2cv3Mat7releaseEv[cv::Mat::release()]+0x47): undefined reference to `cv::Mat::deallocate()'
collect2: ld returned 1 exit status

错误的原因应该是g++没有正确的链接到OpenCV的库。各种尝试之后发现只要调换一下参数的位置就可以正常编译 -_-!!
改用这样一行命令编译就没有问题了。

g++-4.6 test.cpp `pkg-config --libs opencv`  -o test.bin

具体原因不明,但是如果把g++-4.6换作g++-4.4就没有这个问题。
这行命令也可以正常编译:

g++-4.4 `pkg-config --libs opencv`  -o test.bin test.cpp 

这么看起来很有可能是g++-4.6的bug,或者是改进..?

On 2 dimensional array of C++

I was asked about this today. In practice, I rarely use 2-dimensional array, instead I use vector of vectors.

To allocate a 2-d array on the stack, a C-style array is

int d[2][3];

Then to refer to an element it is like

d[i][j];

To make a dynamical allocation, one can NOT write his code like this

int **wrong_d = new int[2][3];

since the d in int d[2][3]; is not a int**, instead it is of the type

int (*)[3]

or in your evil human words, it is a int[3] pointer.

It is a little tricky to declare a 2-d array dynamically.

int (*d)[3] = new int[2][3];

Or

int v1 = 2;
int (*d)[3] = new int[v1][3];

The number 3 here can NOT be replace by a non-constant. My understanding is that since this value is associated with the type of d, in a strong type language like C/C++ which should be known by the compiler.

We can check the size of variables to verify this interpretation.

 1 #include 
  2 
  3 using namespace std;
  4 
  5 int main(int argc, char **argv)
  6 {
  7     int v1 = 2;
  8     int (*d)[3] = new int[v1][3];
  9     cout << "sizeof(d): " << sizeof(d) << endl;
 10     cout << "sizeof(d[0]): " << sizeof(d[0]) << endl;
 11     cout << "sizeof(d[1]): " << sizeof(d[1]) << endl;
 12     cout << "sizeof(d[0][0]): " << sizeof(d[0][0]) << endl;
 13     return 0;
 14 }

The output is:

$./test 
sizeof(d): 8        //< 2 pointers, &d[0] and &d[1]
sizeof(d[0]): 12    //< int[3], d[0][0] d[0][1] d[0][2]
sizeof(d[1]): 12    //< int[3], d[1][0] d[1][1] d[1][2]
sizeof(d[0][0]): 4  //< an integer

To save your life, I would recommend to use vectors.

int v1 = 2;
int v2 = 3;
vector > d(v1, vector(v2, 0)); 

rand函数不可重入

写C代码的时候,srand(int seed) 和 rand() 是常用的伪随机数生成函数。
这两个函数的使用方法很简单,但是一个可能被忽略的细节是,rand() 依赖一个内部的、全局的状态变量。所以 rand() 是不可重入,也是不是线程安全的 (thread-safe) 。

如果多个线程同时调用 rand() 函数,那么无论你如何使用 srand(int seed) 都无法保证结果是可以重现的。每次运行程序,各个线程中 rand() 函数生成的伪随机数序列都和上次不同。

在调试的时候,不能重现的结果会是比较棘手的障碍。

幸好,我们可以用C++11 提供的伪随机数生成器 Pseudo-random number generation (这么翻译好机械-_-!)用法很容易在网络上找到,这里有一个最简单的例子。

#include 
//.....
{
std::default_random_engine gen(0);
int a_random_number = gen();
}

default_random_engine维护自己的内部状态,各个线程都用同样的参数初始化default_random_engine,就可以得到一致的伪随机数序列了。

[OpenCV]detectMultiScale

I met a problem when using the interface ‘detectMultiScale’ of OpenCV. The rectangles it gives out may not be fully inside the frame of the original image. As a result, if these rectangles are applied directly on the original image to crop out the detected objects, your programs crash.

These are the interfaces

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size() );

and

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               vector& rejectLevels,
                               vector& levelWeights,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size(),
                               bool outputRejectLevels=false );

I am copying what I wrote for a pull request on Github, this ‘may-be issue’ can be fixed easily by modifying one line in the source file

modules/objdetect/src/cascadedetect.cpp

Replace this one

Size processingRectSize( scaledImageSize.width - originalWindowSize.width + 1, scaledImageSize.height - originalWindowSize.height + 1 );

with this line

Size processingRectSize( scaledImageSize.width - originalWindowSize.width , scaledImageSize.height - originalWindowSize.height);

My explanation goes here, ignore the line numbers if they look wrong to you.
“Actually, in the code, the workflow is more complicated. In the file cascadedetect.cpp

This is the line building the final detected rectangle

995 rectangles->push_back(Rect(cvRound(x*scalingFactor), cvRound(y*scalingFactor), winSize.width, winSize.height));

the winSize is assigned here

969 Size winSize(cvRound(classifier->data.origWinSize.width * scalingFactor), cvRound(classifier->data.origWinSize.height * scalingFactor));

while the maximum value of x and y can be find here, they are related to the processingRectSize.

971         int y1 = range.start * stripSize;
972         int y2 = min(range.end * stripSize, processingRectSize.height);
973         for( int y = y1; y < y2; y += yStep )
974         {
975             for( int x = 0; x < processingRectSize.width; x += yStep )

Say the original image size is O, the original window size is W, scaling factor is F. O and W is integer and F is a decimal usually larger than 1. The width and height are assumed to be the same for example.

If we calculate the right-most point of the detected rectangle, it should be:

the maximum x is: (cvRound(O/F) - W), current winSize is W*F, following the line 995 we get:

cvRound( (cvRound(O/F) - W) * F ) + W*F

This can be larger than O, say O is 600, F is 4.177250, W is 24, the number we can get above is 601.254 which is larger than 600."

Hope these help.

比较变量地址并不可靠

最近云风大牛又在黑我C++,可是在我学会之前,我还是要坚定不移地待在这贼船上。
嘿嘿 :]

“用比较地址的方法来判断两个变量(的引用)是不是同一个变量是不可靠的”,这个问题很简单,却也容易忽视。

现实的情况是这样的。在写Computer Vision实验的时候,因为程序要面临的计算量往往很大,对程序的性能的优化是十分重要的。
所以我就写出了这样的代码:

void foo_func(Foo& foo)
{
    //< 如果foo和上次传入的是同一个,就省掉一部分重复计算
    static Foo *p_foo = NULL;
    if (p_foo != &foo)
    {
        p_foo = &foo;
        //< 大量计算
    }

    //< 后续计算
}

在实验里,Foo这个类本身往往比较复杂,不可能做内容的逐一比较,同时这个类的实例在初始化之后内容就保持不变。写这个函数的时候没有觉得有什么问题,结果也是正确的。

但是,这么做显然是不可靠的。
这段代码就说明问题了:

#include 

int main(int argc, char **argv)
{
    using namespace std;
    for (int i = 0; i < 100; i++)
    {   
        int j;
        cout << "i: " << i << " &j: " << &j << endl;
    }   
    return 0;
}

在Mac下输出是这样的:

i: 0 &j: 0x7fff52decad0
i: 1 &j: 0x7fff52decad0
i: 2 &j: 0x7fff52decad0
i: 3 &j: 0x7fff52decad0
i: 4 &j: 0x7fff52decad0
i: 5 &j: 0x7fff52decad0
i: 6 &j: 0x7fff52decad0
i: 7 &j: 0x7fff52decad0
i: 8 &j: 0x7fff52decad0
i: 9 &j: 0x7fff52decad0
....

编译器在分配内存的时候,会去复用之前刚刚释放掉的空间,这个是很显然的道理。在上面这种循环里面,局部变量所在的那块空间正好一直被同一个变量复用,如果我每次给j初始化不同的值,它们的地址一致但却不是同一个变量,所以像这样比较变量地址并不可靠。

现在看来,一个可能的解决方法就是给Foo的每一个实例增加一个唯一的ID了。
有没有不用修改Foo就能达到快速比较两个Foo实例的方法呢?

dup, pipe 和 fork

(好久没更新了,呼…一大波死线刚刚结束…)

我几乎一直在用Bash,可是却少有接触到Unix系的系统编程,对系统调用还是知之甚少。这两天实验室里讨论了一个比较基础的问题: 在自己写的程序中,怎么样得到另一个可执行文件的输出?

比如我们有/bin/pwd这个可执行文件,我们可以在自己的程序中用

system("/bin/pwd");

或者是

execl("/bin/pwd",".",NULL);

调用它。

如果想要得到它的输出,应该怎么做比较好?

这个其实是作业题的范畴,涉及到了几个Unix的系统调用:

     #include 

     pid_t
     fork(void);
     #include 

     int
     pipe(int fildes[2]);
     #include 

     int
     dup(int fildes);

     int
     dup2(int fildes, int fildes2);

手册里有各个函数的详细解释。
在这个例子里,pipe用来生成一对文件描述子 (file descriptor,我觉得描述子这个翻译不是很好…),往第二个描述子里写内容,从第一个里面读内容。fork用来得到一个子进程,这里,我们会让子进程来执行pwd,并且把输出写到pipe的一端,然后父进程可以从另一端读入。

但是pwd默认输出到屏幕,也就是标准输出,我们需要使用dup2,把标准输出指向pipe的一端,这样就可以完成任务了。

所以,最终的代码是这样的

#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
    int fd[2];
    int pid;
    pipe(fd);
    int rpipe = fd[0];
    int wpipe = fd[1];
    pid = fork();
    if (pid == 0)
    {   
        /* 子进程关掉读的那端,只用写的一端 */
        close(rpipe); 

        /* 把标准输入指向pipe的写的一端 */
        dup2(wpipe, STDOUT_FILENO);

        /* 执行pwd */
        execl("/bin/pwd",".",NULL);

        /* 嗯 */
        exit(0);
    }   
    else
    {   
        /* 父进程关掉写的那端,只用读的一端 */
        close(wpipe);

        printf("begin parent process\n");
        char readbuffer[1024];

        /* 从读的这端读出pwd的输出 */
        int nbytes = read(rpipe, readbuffer, sizeof(readbuffer));

        printf("Received string: %s | %d\n", readbuffer, nbytes);
        printf("end parent process\n");

        wait(&pid);
    }   
    return 0;
}

所以最后的输出就是像这样的

begin parent process
Received string: /Users/XXX/XXXX
 | 16
end parent process

话说fork除了考试之外,似乎就没有用过了,代码还是写得太少了…

(Image from Github)

Linux Swap文件

想象一下,两个实验进程跑了两天,还有一天就跑完了,这个时候你发现如果再跑一会儿内存就要爆了…怎么办? (好惊险的感觉 XD)

好吧,其实用到的只是很基本的操作系统知识,不过还真难得用到一回。

程序面对的都是虚拟内存。64位的操作系统下,虚拟内存非常大,但是实际物理内存相对而言小得多。所以,操作系统对内存分页 (就是分成一块一块的,每一块儿叫做一页) 物理内存一旦满了,把暂时不需要的页写到硬盘里。过了一会儿程序又要访问被写到硬盘里的那部分内容,操作系统就在物理内存中选一个页 (怎么选很讲究的),把硬盘里的那个给换回来。程序不停的运行,操作系统就换来换去…

所以,上面我们遇到的情形就可以解决了。把其中一个进程挂起 (suspend),Linux下可以用

Ctrl+Z

,然后这部分内存就是暂时不用的了。这个时候用

$top

查看内存使用情况,可以看到一个CPU占用率为0的进程占用的内存越来越少,另一个越来越多。这样就行了,等一个进程跑完,再用

$fg

命令把挂起的进程调到前台就可以了。

但是等等。虚拟内存具体是在哪里呢?数据终究是写在内存/硬盘上的,Linux下被换到硬盘上的内存在Swap分区 (交换分区) 里。安装系统的时候需要格式化一个分区为Swap格式,就是这个分区。

$swapon -s

可以查看交换分区的大小。

糟糕!刚才那个被挂起的进程占用了24G的内存,但是现在看到我的交换分区只用12G,怎么办?一旦交换分区和内存都满了,会发生神马事情,我也没有体验过,估计应该是系统卡死或者卡而不死吧。

所以,应该赶紧增加交换分区的大小才是。可是如果你和我一样,很悲催的没有Root权限 (Root权限貌似是必须的…),而且也根本没有多余的分区可以挂载了,怎么办?

可以用Swap文件 (点题) ! 就是把一个文件用做swap分区,Linux下什么都是文件,分区应该也是吧。要增加系统可用的虚拟内存,当然这要求你硬盘剩余空间够大…

$dd if=/dev/zero of=~/swapfile bs=1024 count=41943040

会在HOME下创建一个40G的文件”~/swapfile”,命令要执行一会儿,需要写一段时间硬盘,执行完了会显示写硬盘的速度,可以用来做测速的。
然后告诉系统用这个文件做交换文件

mkswap ~/swapfile

就没问题了。
(实际上不行,还需要这个命令, –!)

sudo swapon ~/swapfile

其实最一开始那个情形下,如果两个进程继续跑下去,操作系统仍然会把一部分内容换出来的。只是大量换页操作会让程序执行的时间更长,而且如果交换空间不够大,系统最终仍然可能会被卡死。

感叹一下,一个实验要跑三天的同学伤不起啊..

libstdc++ 4.6 type_traits 的一个bug

如果你的编译环境和我一样,然后又在用C++11的时候,不小心直接或者间接用到了<chrono>这个头文件,应该就会遇到这个bug。

完整的错误信息如下:

|| In file included from /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/thread:37:
/usr/include/c++/4.6/chrono|240 col 10| error: cannot cast from lvalue of type 'const long' to rvalue reference type 'rep' (aka 'long &&'); types are not compatible
||           : __r(static_cast(__rep)) { }
||                 ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|128 col 13| note: in instantiation of function template specialization 'std::chrono::duration >::duration' requested here
||             return _ToDur(static_cast(__d.count()));
||                    ^
/usr/include/c++/4.6/chrono|182 col 9| note: in instantiation of function template specialization 'std::chrono::__duration_cast_impl<std::chrono::duration >, std::ratio, long &&, true, true>::__cast >' requested here
||         return __dc::__cast(__d);
||                ^
/usr/include/c++/4.6/chrono|247 col 10| note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration >, long, std::ratio >' requested here
||           : __r(duration_cast(__d).count()) { }
||                 ^
/usr/include/c++/4.6/chrono|466 col 9| note: in instantiation of function template specialization 'std::chrono::duration >::duration, void>' requested here
||         return __ct(__lhs).count() < __ct(__rhs).count();
||                ^
/usr/include/c++/4.6/chrono|667 col 7| note: in instantiation of function template specialization 'std::chrono::operator<, long, std::ratio >' requested here
||                     < system_clock::duration::zero(),
||                     ^
/usr/include/c++/4.6/chrono|128 col 13| error: call to implicitly-deleted copy constructor of 'std::chrono::duration >'
||             return _ToDur(static_cast(__d.count()));
||                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|182 col 9| note: in instantiation of function template specialization 'std::chrono::__duration_cast_impl<std::chrono::duration >, std::ratio, long &&, true, true>::__cast >' requested here
||         return __dc::__cast(__d);
||                ^
/usr/include/c++/4.6/chrono|247 col 10| note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration >, long, std::ratio >' requested here
||           : __r(duration_cast(__d).count()) { }
||                 ^
/usr/include/c++/4.6/chrono|466 col 9| note: in instantiation of function template specialization 'std::chrono::duration >::duration, void>' requested here
||         return __ct(__lhs).count() < __ct(__rhs).count();
||                ^
/usr/include/c++/4.6/chrono|667 col 7| note: in instantiation of function template specialization 'std::chrono::operator<, long, std::ratio >' requested here
||                     < system_clock::duration::zero(),
||                     ^
/usr/include/c++/4.6/chrono|233 col 12| note: explicitly defaulted function was implicitly deleted here
||         constexpr duration(const duration&) = default;
||                   ^
/usr/include/c++/4.6/chrono|349 col 6| note: copy constructor of 'duration >' is implicitly deleted because field '__r' is of rvalue reference type 'rep' (aka 'long &&')
||         rep __r;
||             ^
/usr/include/c++/4.6/chrono|182 col 9| error: call to implicitly-deleted copy constructor of 'typename enable_if<__is_duration<duration > >::value, duration > >::type' (aka 'std::chrono::duration >')
||         return __dc::__cast(__d);
||                ^~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|247 col 10| note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration >, long, std::ratio >' requested here
||           : __r(duration_cast(__d).count()) { }
||                 ^
/usr/include/c++/4.6/chrono|466 col 9| note: in instantiation of function template specialization 'std::chrono::duration >::duration, void>' requested here
||         return __ct(__lhs).count() < __ct(__rhs).count();
||                ^
/usr/include/c++/4.6/chrono|667 col 7| note: in instantiation of function template specialization 'std::chrono::operator<, long, std::ratio >' requested here
||                     < system_clock::duration::zero(),
||                     ^
/usr/include/c++/4.6/chrono|233 col 12| note: explicitly defaulted function was implicitly deleted here
||         constexpr duration(const duration&) = default;
||                   ^
/usr/include/c++/4.6/chrono|349 col 6| note: copy constructor of 'duration >' is implicitly deleted because field '__r' is of rvalue reference type 'rep' (aka 'long &&')
||         rep __r;
||             ^
/usr/include/c++/4.6/chrono|255 col 11| error: rvalue reference to type 'long' cannot bind to lvalue of type 'long'
||         { return __r; }
||                  ^~~
/usr/include/c++/4.6/chrono|247 col 39| note: in instantiation of member function 'std::chrono::duration >::count' requested here
||           : __r(duration_cast(__d).count()) { }
||                                              ^
/usr/include/c++/4.6/chrono|466 col 9| note: in instantiation of function template specialization 'std::chrono::duration >::duration, void>' requested here
||         return __ct(__lhs).count() < __ct(__rhs).count();
||                ^
/usr/include/c++/4.6/chrono|667 col 7| note: in instantiation of function template specialization 'std::chrono::operator<, long, std::ratio >' requested here
||                     < system_clock::duration::zero(),
||                     ^
/usr/include/c++/4.6/chrono|666 col 21| error: static_assert expression is not an integral constant expression
||       static_assert(system_clock::duration::min()
||                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|666 col 21| note: undefined function 'operator<, long, std::ratio >' cannot be used in a constant expression
/usr/include/c++/4.6/chrono|460 col 7| note: declared here
||       operator& __lhs,
||       ^
/usr/include/c++/4.6/chrono|141 col 40| error: cannot cast from lvalue of type 'const intmax_t' (aka 'const long') to rvalue reference type 'long &&'; types are not compatible
||               static_cast(__d.count()) / static_cast(_CF::den)));
||                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|182 col 9| note: in instantiation of function template specialization 'std::chrono::__duration_cast_impl<std::chrono::duration >, std::ratio, long &&, true, false>::__cast >' requested here
||         return __dc::__cast(__d);
||                ^
/usr/include/c++/4.6/chrono|679 col 21| note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration >, long, std::ratio >' requested here
||         return std::time_t(duration_cast
||                            ^
/usr/include/c++/4.6/chrono|154 col 40| error: cannot cast from lvalue of type 'const intmax_t' (aka 'const long') to rvalue reference type 'long &&'; types are not compatible
||               static_cast(__d.count()) * static_cast(_CF::num)));
||                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.6/chrono|182 col 9| note: in instantiation of function template specialization 'std::chrono::__duration_cast_impl<std::chrono::duration >, std::ratio, long &&, false, true>::__cast >' requested here
||         return __dc::__cast(__d);
||                ^
/usr/include/c++/4.6/chrono|577 col 22| note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration >, long, std::ratio >' requested here
||         return __time_point(duration_cast(__t.time_since_epoch()));
||                             ^
/usr/include/c++/4.6/chrono|687 col 9| note: in instantiation of function template specialization 'std::chrono::time_point_cast<std::chrono::duration >, std::chrono::system_clock, std::chrono::duration > >' requested here
||         return time_point_cast
||                ^
|| 7 errors generated.

Google搜到了这个帖子 http://urfoex.blogspot.com/2012/06/c11-fixing-bug-to-use-chrono-with-clang.html
和我的环境略有不同,但是出错的地方是一样的。

我自己的环境是

Ubuntu 12.04, clang 3.1, libstdc++ 4.6

解决方法是改

/usr/include/c++/4.6/type_traits

这个文件的第1113行

{ typedef decltype(true ? declval<_Tp>() : declval<_Up>()) type; };

改成

{ typedef typename decay() : declval<_Up>())>::type type; }

出错的原因,大概是因为没有正确的做类型转换。decay这个函数我从没用过,看文档是C++11引入的函数,用做类型转换。对此不理解,不能在这里误导人,文档原文如下

Applies lvalue-to-rvalue, array-to-pointer, and function-to-pointer implicit conversions to the type T, removes cv-qualifiers, and defines the resulting type as the member typedef type. This is the type conversion applied to all function arguments when passed by value.

总之,把那行改掉再重新编译自己的工程就没有问题了。

如果你是和我一样没有root权限去改/usr/include下的文件,也可以这么做:
1. 把/usr/include/c++/4.6/type_traits拷贝到自己的目录下面,比如

$cp /usr/include/c++/4.6/type_traits $HOME/local/include/c++/4.6

2. 改自己目录下的版本,然后编译的时候把这个目录放到编译选项里,clang会优先使用这个目录下的版本

$clang++ blahblah.cpp -I$HOME/local/include/c++/4.6/

好吧,虽然没有完全搞清楚这个bug是怎么回事,但是解决掉一大堆报错的感觉挺好 :]