Linear regression is an approach to machine/statistical learning generally applied to value prediction problems. It is a form of supervised learning, wherein the training data provides the “correct” answer in addition to the data points generated by an unknown function, (f). Although in this case we were provided a 2-dimensional data set, linear regression can be used on higher-dimensional data sets. The linear regression method assumes that the unknown function f can be approximated using a polynomial linear equation of d terms (the number of features being measured plus a constant value for bias). Among machine learning algorithms, it is fairly simple, and in his CalTech lectures Dr. Abu-Mostafa calls linear regression “one-step learning.”
A naïve approach to linear regression is an incremental solution where the algorithm loops through one iteration for every data point provided. At each pass, the line for the potential solution is “nudged” into place to where it can approximate the category of future data. This approach is obviously expensive from a time standpoint so a solution using linear algebra operations on matrices was found. In the same way that matrices can be used to solve systems of equations, a process of transposing, multiplying, and inverting the training data in matrix form can be used to “solve” the problem of the unknown function f.
As required by the assignment, I implemented my solution to the categorizing problem using Python. I used three libraries popular for scientific and mathematical applications, NumPy, SciPy, and Matplotlib. Matplotlib provides a simple interface to quickly generate visualizations of data which I used to plot the points and regression line. SciPy has a nifty method for quickly opening and parsing tabulated or character-separated data files such as CSVs, which I used to intake the training data set. Finally, NumPy provides the matrix data structures and linear algebra operations necessary to handle the dot-product multiplication, transposes, and inversions required by the linear regression algorithm. While SciPy also provides methods that can be used to automate much of the machine learning process, I did not take advantage of these tools as that would defeat the purpose of the exercise.
I encountered difficulty initially in the second-to-final step of the algorithm wherein the XTX^-1 matrix is multiplied by the XT and Y matrices. After some debugging and troubleshooting I found that the Y matrix needed to be transposed in order to put the matrix in the dimensions needed required by matrix multiplication rules. I was then finally able to arrive at the WT (coeffecients/weights for the terms of the linear equation) matrix, with an output of [[ 0.0867182 0.39972487]]. Applying these weights to the hypothesized equation w1x1 + w0x0 (x0 being 1) I got the (massively oversimplified) solution of g(x) = 0.0867*x + 0.399.
Ultimately, while my data plot appears to accurately represent the training data set, I am clearly missing something as my line appears to be missing a positive constant which would place it slightly higher on the graph. After a few attempts to discover the missing constant, I decided to study my notes again to try to isolate the mistake and possibly use some of the SciPy tools to find a better fit.
Coelho, L. & Richert, W. (2013). Building machine learning systems with Python. Packt Publishing: Birmingham, UK. pp. 19 – 35.
Bressert, E. (2013). SciPy and NumPy. O’Reilly: Sebastopol, CA.
Harrington, P. (2012) Machine learning in action. Manning: Shelter Island, NY. pp 153 – 159.
Klein, P. N. (2013) Coding the Matrix: Linear algebra through computer science applications. Newtonian Press. pp. 89-90.