Sunday, March 30, 2014

Three Easy Steps to Install Scikit-Learn

1. Install WinPython: http://sourceforge.net/projects/winpython/?source=dlp

This will install python distribution package together with numpy, scipy, etc.

2. Register Python using "WinPython Control Panel.exe" under WinPython folder.

3. Download and run scikit-learn installation executable. Make sure their versions are matched.

--Simple Test

>>> from sklearn import linear_model
>>> clf=linear_model.LinearRegression()
>>> clf.fit([[0,0],[1,1],[2,2]],[0,1,2])
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
>>> clf.coef_
array([ 0.5,  0.5])
>>>

Thursday, March 27, 2014

Ridge Regression Model with Generalized Cross Validation

An introduction to GCV can be found here:

http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/csa/node123.html

In ridge regression model with generalized cross validation, we can determine the ridge parameter by selecting \$\lambda\$ that minimizes the following:

$\displaystyle \mathrm{GCV}(\lambda)=\frac{\frac{1}{n} \sum_{i=1}^n ( y_i-\widehat{f}_{\lambda}(t_i))^2} {(1-trH(\lambda)/n)^2}\, .$
where matrix \$H(\lambda)\$ is given as
\$X(X^{T}X+\lambda I)X^{T}\$.

The code:


 void CRidgeRegressionGCV::ComputeRegression()  
 {  
      double minGCV = FLT_MAX;  
      for (int t = 0; t < m_alphas.size(); t++)  
      {  
           double alpha = m_alphas[t];  
           int m = m_X.size();  
           int n = m_X[0].size();  
           MatrixXd mat(m, n + 1);  
           VectorXd rhs(m);  
           for (int i = 0; i < m; i++)  
           {  
                for (int j = 0; j < n + 1; j++)  
                {  
                     if (j < n)  
                     {  
                          mat(i, j) = m_X[i][j];  
                          rhs(i) = m_Y[i];  
                     }  
                     else  
                     {  
                          mat(i, j) = 1;  
                     }  
                }  
           }  
           MatrixXd tmp = mat.transpose()*mat + alpha*MatrixXd::Identity(n + 1, n + 1);  
           JacobiSVD<MatrixXd> svd(tmp, ComputeThinU | ComputeThinV);  
           MatrixXd res = svd.solve(mat.transpose()*rhs);  
           vector<double> coef(n);  
           double intercept;  
           coef.resize(n);  
           for (int i = 0; i < n; i++)  
           {  
                coef[i] = res(i, 0);  
           }  
           intercept = res(n, 0);  
           double GCV = 0;  
           for (int i = 0; i < m; i++)  
           {  
                double sum = 0;  
                for (int j = 0; j < n; j++)  
                {  
                     sum += m_X[i][j] * coef[j];  
                }  
                sum += intercept;  
                double diff = m_Y[i] - sum;  
                double diffsquare = diff*diff;  
                GCV += diffsquare;  
           }  
           GCV /= m;  
           MatrixXd tmp2=svd.solve(mat.transpose());  
           MatrixXd H = mat*tmp2;  
           GCV /= (1-H.trace()/m)*(1-H.trace()/m);  
           if (GCV < minGCV)  
           {  
                minGCV = GCV;  
                m_coef = coef;  
                m_intercept = intercept;  
                m_alpha = alpha;  
           }  
      }  
 }  

Ridge Regression Model

In ridge regression model, optimal w is given by:
\$(X^{T}X+\alpha I)^{-1}X^{T}Y\$

The code:


 void CRidgeRegression::ComputeRegression()  
 {  
      int m = m_X.size();  
      int n = m_X[0].size();  
      MatrixXd mat(m, n + 1);  
      VectorXd rhs(m);  
      for (int i = 0; i < m; i++)  
      {  
           for (int j = 0; j < n + 1; j++)  
           {  
                if (j < n)  
                {  
                     mat(i, j) = m_X[i][j];  
                     rhs(i) = m_Y[i];  
                }  
                else  
                {  
                     mat(i, j) = 1;  
                }  
           }  
      }  
      MatrixXd tmp = mat.transpose()*mat + m_alpha*MatrixXd::Identity(n + 1, n + 1);  
      JacobiSVD<MatrixXd> svd(tmp, ComputeThinU | ComputeThinV);  
      MatrixXd res = svd.solve(mat.transpose()*rhs);  
      m_coef.resize(n);  
      for (int i = 0; i < n; i++)  
      {  
           m_coef[i] = res(i, 0);  
      }  
      m_intercept = res(n, 0);  
 }  

Ways to Insert Equation and Code in Blogger

For code, the simplest way is to use the following website to convert it into html block:

http://codeformatter.blogspot.com/

For equation, you can use latex in blogger following the instructions on page:

https://www.codecogs.com/latex/integration/blogger/install.php

Ordinary Least Squares Model

In ordinary least squares model, optimal w is given by:
\$w=X^{-1}Y\$

A draft function for this is written as

 void CLinearRegression::ComputeRegression()  
 {  
      int m = m_X.size();  
      int n = m_X[0].size();  
      MatrixXd mat(m,n+1);  
      VectorXd rhs(m);  
      for (int i = 0; i < m; i++)  
      {  
           for (int j = 0; j < n+1; j++)  
           {  
                if (j < n)  
                {  
                     mat(i, j) = m_X[i][j];  
                     rhs(i) = m_Y[i];  
                }  
                else  
                {  
                     mat(i, j) = 1;  
                }  
           }  
      }  
      JacobiSVD<MatrixXd> svd(mat, ComputeThinU | ComputeThinV);  
      MatrixXd res=svd.solve(rhs);  
      m_coef.resize(n);  
      for (int i = 0; i < n; i++)  
      {  
           m_coef[i] = res(i, 0);  
      }  
      m_intercept = res(n, 0);  
 }  

Wednesday, March 26, 2014

An Introduction to Eigen

What is Eigen?

Eigen is a high-level C++ library of template headers for linear algebramatrix and vector operations,numerical solvers and related algorithms.

How to install Eigen?

Get the latest Eigen package from here. Unzip it and include the appropriate header files. There is no need to compile or include binaries.

How to use Eigen?

I have included my code for computing ridge regression here to show some sample usages of Eigen. Most of the time it is very straight-forward. 


 #include<Eigen/SVD>  
 using namespace Eigen;  
 void CRidgeRegression::ComputeRegression()  
 {  
      int m = m_X.size();  
      int n = m_X[0].size();  
      MatrixXd mat(m, n + 1);  
      VectorXd rhs(m);  
      for (int i = 0; i < m; i++)  
      {  
           for (int j = 0; j < n + 1; j++)  
           {  
                if (j < n)  
                {  
                     mat(i, j) = m_X[i][j];  
                     rhs(i) = m_Y[i];  
                }  
                else  
                {  
                     mat(i, j) = 1;  
                }  
           }  
      }  
      MatrixXd tmp = mat.transpose()*mat + m_alpha*MatrixXd::Identity(n + 1, n + 1);  
      JacobiSVD<MatrixXd> svd(tmp, ComputeThinU | ComputeThinV);  
      MatrixXd res = svd.solve(mat.transpose()*rhs);  
      m_coef.resize(n);  
      for (int i = 0; i < n; i++)  
      {  
           m_coef[i] = res(i, 0);  
      }  
      m_intercept = res(n, 0);  
 }