Principal Component Analysis — Explained
If you are in any way related to the field of Data Science, Machine Learning, or in general Statistics you would have definitely come across the concept of Principal Component Analysis aka PCA. Personally speaking, I found the name itself quite a mouthful let alone the concept in itself. In this post, I would very briefly try to decode what PCA is, the maths behind it, and a few areas of applications. So let's dive right in…
PCA performs the task of dimensionality reduction of the feature matrix, where the feature matrix is nothing but just multiple feature columns. Eg. for a car dataset the feature columns could be Brand, Colour, Type (hatchback, sedan, SUV, etc), Model, Year of launch, etc. However, in a real-world dataset, we might have somewhere across hundreds if not thousands of features. And it becomes really difficult to analyze and visualize all the features in a single glance and assess which features are important and which aren’t. And viola, PCA comes to our rescue. PCA selects the important features based on which features have maximum variance wrt to label (in the above example label could be the price of the car i.e a feature we want to predict) and thereby reduces the ‘dimension’ of the feature matrix to only the features which could be used to explain the entire dataset with the minimum information loss. So basically PCA is an unsupervised dimensionality reduction technique that takes a high-dimensional feature matrix performs some maths trickery on it(discussed in a later section) and reduces it to a lower-dimensional feature matrix such that this new matrix still captures most of the dynamics and information of the original matrix.
The maths trickery explained:
The feature matrix from hereon will be referred to as data. The principal components are eigenvectors of the data’s covariance matrix. Thus, the principal components are often computed by eigen-decomposition of the data covariance matrix or singular value decomposition of the data matrix.
If we first want to just pick which one-dimensional space to project our D dimensional data on, we will figure out all the different eigenvalues of the covariance matrix S corresponding to the data, pick the biggest one(biggest lambda) and then figure out what is the eigenvector that matches up with that eigenvalue (of course we want to make sure to normalize that eigenvector so it’s a unit vector). And that’s going to be the first principal component. That works if we want to project the data on a one-dimensional space but what if we want to project on a higher space. We can use mathematical induction on this. Eg: Let’s say you want to figure out what are the best two dimensions to project your ten-dimensional data on. So to get the first one you figure out what’s the top eigenvalue and you find out the corresponding unit eigenvector. Similarly, for the second one, you figure out what is the second biggest eigenvalue and use the eigenvector corresponding to it and so on.
I hope you found the content useful and understandable. Please subscribe for much such content.