Setup dropbox without X

Assume you are accessing your server remotely through SSH.

1. Download the package, say you are using Ubuntu

$wget https://www.dropbox.com/download?dl=packages/ubuntu/dropbox_1.4.0_amd64.deb

2. Install the package

$sudo dpkg -i *dropbox_1.4.0_amd64.deb

3. Start Dropbox

$dropbox start

4. You will get an url to link your computer here, copy-and-paste into your web-browser. Restart your dropbox, all set :]

$dropbox stop
$dropbox start

I tried to open the url with w3m or lynx inside the terminal but failed.
It seems they don’t support Javascript well.

Art & Music Festival

今天又发现这里的一个节日。
快到中午的时候出门去实验室和恩劈康普利特”约会”,因为下周要考复杂度了。
走到七街路口的时候发现路给封了,看到路上两边好多棚子,
仿佛是农产品展销会。
走过去才知道今天是霍博肯的Art&Music Festival (艺术音乐节?),貌似一年一次。
于是就干脆沿着主街一路看过去,人真不少,老外也喜欢热闹。
IMG_0157
很多棚子都是在卖画
IMG_0163
也有这种
IMG_0162
或者小物件
IMG_0159
然后就是吃的,感觉好像在杭州走河坊街…
IMG_0161
IMG_0158
也有小孩玩的,很多衣服啥的都是给小孩准备的
IMG_0160
也有好几辆这种车,貌似是给租的
IMG_0164
来得早了,乐队还没有开始表演
IMG_0166

然后买了根烤肉就默默地去实验室看书了…

gitignore

如果用git管理的工程文件结构比较复杂,可以新建一个.gitignore文件

$cat .gitignore
*.o
tmp*

用来忽略一些文件,比如上面的内容可以用来忽略以.o结尾的文件和tmp开头的路径。

今天发现还可以用惊叹号!来做白名单。

$cat .gitignore
*.o
tmp*
!*.c

这样确保.c文件不被git忽略。

值得注意的是,.gitignore里条件是靠后优先的,写在后面会覆盖前面的效果。

比如当前目录下有

$ls .
a.c a.o tmp.c

上面的.gitignore不会忽略掉tmp.c

TEDxUW Why you will fail to have a great career

还是应该时不时地看看TED的演讲,总能接触到一些有意思的想法和观点。
TEDxUW是本地的TED类似的组织,Larry Smith是UW的老师。
虽然不认同他所有的观点,但还是有些感触,就是这个: 热情(Passion)和兴趣(Interest)是不同的。

Why you will fail to have a great career

I asked, do you have passion? You say, I have interest. … Passion, Interest is not the same thing, are you really gonna go to your sweetie to say, Marry me! You are interesting 囧…

Good Friday

今天是Good Friday,是一个宗教节日。
维基上抄一段过来:
“耶穌受難節,是基督教信徒纪念耶稣基督被钉在十字架上受难的日子,是復活節前一个星期五。据圣经记载,耶稣于公元33年猶太曆尼散月十四日上午九时左右被钉在十字架上,于下午三时左右死去。耶穌唯獨吩咐門徒要紀念他的死亡。(路加福音22:19,20)”

从实验室出来的时候正好看到一队人走过,感觉挺有意思的。
头前是一辆警车,后面是一辆面包,上面应该是放着音箱功放之类的,开得很慢,有一个女士拿着话筒跟在后面唱着,曲调很有宗教味道,但是歌词完全不懂,不知道是什么语言。后面有穿着各色袍子的男男女女,大概有50人的样子,有的手里拿着书,有的手里捧着的像是歌本,跟着唱着。中间有一个举着一个大木牌上面画着像是耶稣,还有一个肩扛着十字架走在中间。

当时看着怯了,也没拍照。
在网上找了找类似的照片,貌似我看到的是很小规模的游行了。
网上的一张照片:

照片来源

想想自己没有宗教信仰,少了很多体会啊。

PCA的实现

PCA,全称是Principal component analysis,中文叫做主成分分析,是一种常用的数据处理手段。

直观的说,PCA是一种降维的手法。比如现在我们有1000个数据点,每个数据点是一个128维的向量,存储上可以是一个1000×128维的数组。经过PCA处理,我们仍然得到1000个数据点,但是每个数据点是一个小于128维的向量,比如我们用PCA将128维的数据降到64维。
PCA可以保证,在降维之后,数据表示的信息损失最小。

“损失最小”具体怎么定义?
还是以1000个128维的点为例,这1000个点,也就是1000个向量在一个128维的空间中。从在任何一维,也就是一个方向上来看,如果在这个方向上,各个向量大小差异很大,那么这个方向是很重要的。
也就是,反过来看,如果在某个方向上,每一个向量大小都很接近,那么如果不考虑这个方向,也就是去掉这一维的数据,对我们分析这1000个点并没有多大的影响。所以,“损失最小”对应着“差异最小”。

那么具体怎么做呢?

这里是两种常用的方法: SVD分解和EIG分解(特征值分解)。
共同点在于先从数据得到一个矩阵M,M的特征值个数对应着数据的维度,特征值越大那么对应的这一维越重要,也就是“差异越大”。

SVD分解, matlab

    sub_input_data = (input_data - repmat(mean(input_data),count,1))/sqrt(count-1);
    [U,S,V] = svd(sub_input_data);
    % First out_dim columns as PCA bases
    pcaV = V(:,1:out_dim);
    output_data = input_data * pcaV;

EIG分解, matlab

    mean_input_data = mean(input_data);
    sub_input_data = input_data - repmat(mean_input_data, count,1);
    mean_mat = sub_input_data' * sub_input_data ./ (count - 1);
    cov_mat = mean_mat;
    [V D] = eig(cov_mat);
    % Last out_dim columns as PCA bases
    pcaV = V(:,in_dim - out_dim + 1: in_dim);
    output_data = input_data * pcaV;

如果用C++的话,OpenCV本身就提供PCA,当然也可以自己实现。
OpenCV下可以用这个方法做EIG分解。

cv::eigen(covMat, eigenValues, eigenVectors);

具体Matlab代码也比较简单,放在Github上了。
MatlabPCA

原始数据:

PCA之后:

fstream issue under 64-bits cygwin

This is a question I posted on StackOverflow: fstream issue under 64-bits cygwin, glad to get help from Nemo on Stackflow, all credits goes to him.

The problem is when I use 64-bits g++ to compile the same piece of code, I get unexpected different result.

The source code looks like this:

#include <iostream>
#include <fstream>
using namespace std;
 
int main()
{
    int rows = 200;
    int cols = 200;
    float data[rows*cols];
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < cols; j++)
        {
            data[i*cols+j] = i*cols+j;
        }
    }
    const char *file = "tmp.txt";
    ofstream fs(file);
    if (fs.is_open())
    {
        fs.write((char*)&rows, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)&cols, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)data, sizeof(float)*rows*cols);
        cout << fs.tellp() << endl;
        fs.close();
    }
    return 0;
}

I am writing two integers and a block of float values into a binary file. It prints out how many bytes it wrote.

The expected result is:

4
8
160008

All the actions were performed under Cygwin. When the code was compiled with g++.exe, the result is right.

But when I use x86_64-w64-mingw32-g++.exe (only by which can generate 64-bits binary), the result is wired.

4
8
160506

It is wired.

According to Nemo’s answer, this is because by default fstream will be opened in binary mode under *nix. This also holds for 32 bits g++ under Cygwin, but not for 64 bits cygwin g++.

It leads to an unexpected behavior, fstream will replace some special bytes, say ‘newline’ in the binary data, with different special bytes, say ‘unix-style newline’.

To solve this problem, replace the line

ofstream fs(file);

with

ofstream fs(file,ios_base::binary);

We can alter the content of the float data array to see what will happen.

#include <iostream>
#include <fstream>
using namespace std;
 
int main()
{
    int rows = 200;
    int cols = 200;
    float data[rows*cols];
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < cols; j++)
        {
            data[i*cols+j] = 0;
        }
    }
    const char *file = "tmp.txt";
    ofstream fs(file);
    if (fs.is_open())
    {
        fs.write((char*)&rows, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)&cols, sizeof(int));
        cout << fs.tellp() << endl;
        fs.write((char*)data, sizeof(float)*rows*cols);
        cout << fs.tellp() << endl;
        fs.close();
    }
    return 0;
}

The output is:

4
8
160208

It somehow supports the above explanation.

Use a different delimiter in sed

Think about this, you want to replace “XXX” with a path like “/path/to/YYY” in a file.
Your file looks like this:

XXX
AAA
XXX

Your bash script looks like this:

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e '{s/$VAR/$NEW_PATH/g}' your_file

Well, it won’t work, since single quotes ‘ will force bash to keep variables as-is.

Ok, we try to use double quotes ”

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e "{s/$VAR/$NEW_PATH/g}" your_file

This one won’t work either.
It is ok for sed to not have the single quotes, the problem is
you have slash in your NEW_PATH variable, which is the default delimiter of sed.

This simple stuff took me about an hour :[

The solution is just use a different delimiter in sed.

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e "{s@$VAR@$NEW_PATH@g}" your_file

Sometimes, you still want to use single quotes to keep your special symbols, such as ‘(‘ ‘)’
It is ok for your to mix quotes:

NEW_PATH="/path/to/YYY"
VAR="XXX"
sed -e '{s@\(.*\)@'"$NEW_PATH"'@g}' your_file