博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
PRML 3: Linear Discriminants
阅读量:5095 次
发布时间:2019-06-13

本文共 4980 字,大约阅读时间需要 16 分钟。

 

  As an alternative for generative models and discriminative models, a discriminant directly assigns a feature vector to one of K classes. One of the simplest discriminant function for 2-class problems should be something like $y(\vec{x})=sign(\vec{w}^T\cdot\vec{x}+b)$, where $\vec{w}$ is the pending parameter vector and b is a pending bias. Here $\vec{x}$ is different from the one we talk about in regression models since it no more comprises a bias term.

 

  To obtain proper parameters, we can draw on a simple algorithm called Perceptron, which gurantees all the training data shall be correctly classified. This is done by minimizing an error function, each of whose terms should be something like $-(\vec{w}^T\cdot\vec{x}_n+b)\cdot t_n$, in an iterative way, and this procedure will never terminate if the problem is not linearly separable.

1 function w = percept(X,t) 2     % Peceptron Algorithm for Linear Classification 3     % Precondtion: X is a set of data columns, 4     %       row vector t is the labels of X (+1 or -1) 5     % Postcondition: w is the linear model parameter 6     %       such that y = sign(w'* x) 7     [m,n] = size(X); 8 w = zeros(m,1); 9 cnt = 0; % consecutive hit number 10 cur = 1; % current data item 11 while (cnt

 

   Fisher's Linear Discriminant is another linear classifier, which makes every endeavor to maximize the class separation by choosing a deisirable direction on which the projections of two mean vectors have the largest distance. This target is attained by finding a maximum point for the Fisher criterion: $J(\vec{w})=\frac{(m_2-m_1)^2}{S_1^2+S_2^2}$, where $m_1$, $m_2$ and $S_1$, $S_2$ are the means and variances of the projected data respectively.

1 function w = fisher(X,t) 2    % Fisher's Linear Discriminant for 2-class problems 3    % Precondtion: X is a set of data columns, 4    %       row vector t is the labels of X (+1 or -1) 5    % Postcondition: w is the linear model parameter 6    %       such that y = sign(w'* x) 7    d = size(X,1)-1; 8    % calculate the mean vectors of the 2 classes: 9    m1 = zeros(d,1);10    m2 = zeros(d,1);11    n1 = 0; n2 = 0;12    for i = 1:size(t,2)13        if (t(1,i)>0)14            n1 = n1+1;15            m1 = m1+X(1:d,i);16        else17            n2 = n2+1;18            m2 = m2+X(1:d,i);19        end20    end21    m1 = m1/n1;22    m2 = m2/n2;23    % calculate the within-class covariance matrix:24    Sw = zeros(d);25    for i = 1:size(t,2)26        if (t(1,i)>0)27            Sw = Sw+(X(1:d,i)-m1)*(X(1:d,i)-m1)';28        else29            Sw = Sw+(X(1:d,i)-m2)*(X(1:d,i)-m2)';30        end31    end32    w = Sw\(m1-m2);33    % choose a proper threshold:34    w0Min = inf;35    w0Max = -inf;36    for i = 1:size(t,2)37        y = w'*X(1:d,i);38        if (t(1,i)>0 & y+w0Max<0)39            w0Max = -y;40        elseif (t(1,i)<0 & y+w0Min>0)41            w0Min = -y;42        end43    end44    w = [w;(w0Min+w0Max)/2];45 end

 

  Support Vector Machine (SVM) is another linear discriminant classifier, whose objective is to maximize the geometric margin of the training set, i.e. $\gamma = \mathop{min}_n \frac{\vec{w}^T\vec{x}_n+b}{||\vec{w}||}$. This is equivalent to the optimization problem of minimizing $\frac{1}{2} ||\vec{w}||^2$ given the restrictions $y_n(\vec{w}^T\cdot\vec{x}_n+b)\geq 1$ for $n=1,2,...,N$:

1 function w = supvect(X,t) 2     % Support Vector Machine for Linear Classification 3     % Precondtion: X is a set of data columns, 4     %       row vector t is the labels of X (+1 or -1) 5     % Postcondition: w is the linear model parameter 6     %       such that y = sign(w'* x) 7     [m,n] = size(X); 8 x0 = zeros(m,1); 9 A = zeros(n,m); 10 for i = 1:n 11 A(i,:) = -t(i)*X(:,i)'; 12  end 13 b = -ones(n,1); 14 w = fmincon('norm',x0,A,b); 15 end

  This is also equivalent to finding $min\mathop{max}_{\vec{w},b} \frac{1}{2}||\vec{w}||+\sum_{n=1}^N\alpha_n[1-y_n(\vec{w}^T\vec{x}_n+b)]$, where $\alpha_n\geq 0$ for $n=1,2,...,N$ are Lagrangian multipliers. Since the problem satisfies Karush-Kuhn-Tucker (KKT) Conditions, we can solve its dual problem instead, which seems relatively easier. Also, we can refomulate it with safe margins so as to fit for non-linearly separable datasets:

    $min\frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_j y_i y_j(\vec{x}_i^T\vec{x}_j)-\sum_{i=1}^N\alpha_i$

    $s.t.\sum_{n=1}^N\alpha_n y_n=0$  and $0\leq\alpha_n\leq C$ for $n=1,2,...,N$

  This problem can be solved by using SMO algorithm, where we iteratively use some heuristics to select two $\alpha$s and re-optimize the problem with respect to them. As we shall see, the optimal $\vec{w}$ should be a linear combination of the support vectors, and thus we can make a prediction for new data with only the support vectors: $y=sign(\sum_{n\in SV}\alpha_n y_n(\vec{x}_n^T\vec{x}_{N+1})+b)$.

 

 

References:

  1. Bishop, Christopher M. Pattern Recognition and Machine Learning [M]. Singapore: Springer, 2006

 

转载于:https://www.cnblogs.com/DevinZ/p/4472477.html

你可能感兴趣的文章
python第九天课程:遇到了金角大王
查看>>
字符串处理
查看>>
ECharts(Enterprise Charts 商业产品图表库)初识
查看>>
LeetCode Factorial Trailing Zeroes (阶乘后缀零)
查看>>
hdu 5402 Travelling Salesman Problem (技巧,未写完)
查看>>
[AIR] 获取U盘,打开U盘
查看>>
HtmlUnitDriver 网页内容动态抓取
查看>>
ad logon hour
查看>>
获得进程可执行文件的路径: GetModuleFileNameEx, GetProcessImageFileName, QueryFullProcessImageName...
查看>>
证件照(1寸2寸)拍摄处理知识汇总
查看>>
罗马数字与阿拉伯数字转换
查看>>
Eclipse 反编译之 JadClipse
查看>>
asp.net 获取IP地理位置的几个主要接口
查看>>
Python入门-函数
查看>>
[HDU5727]Necklace(二分图最大匹配,枚举)
查看>>
距离公式汇总以及Python实现
查看>>
设计模式之装饰者模式
查看>>
【转】Linux内核调试方法总结
查看>>
一道不知道哪里来的容斥题
查看>>
Blender Python UV 学习
查看>>