PCA - Principal Components & Variables

PCA - Principal Components & Variables

In this post, I've discussed about the relationship between Principal Components (PCs) and Variables in PCA.

Most of us would've found it very difficult to figure out the corresponding variables to the PCs in PCA. Let me explain it clearly in the beginning itself that, a PC in a PCA will never corresponds to a particular Variable. A PC is a combination of 2 or more variables. Then how exactly are we going to relate a PC with variable?

The rest of the article might give the answer for this question.

Ok, lets begin with a quick introduction to PCA.

Principal Component Analysis (PCA) is a Dimensionality Reduction technique which is used to reduce the dimension of the data with minimum loss of information.

With a lot of open source tools available in the market, it is very easy to implement PCA. For example, let us take R. In R after running PCA the following statistics will be obtained. Standard Deviation, Rotation, center, scale, x, Proportion of Variance, Cumulative proportion and so on. Out of all the statistics obtained, Rotation and Cumulative proportion is what we need to focus.

The Cumulative Proportion gives you the percentage of total information contained in each PCs. Use the insights from Cumulative Proportion to finalise the number of variables out of all the available variables based on the percentage of proprotion. For example, totally there are 5 variables in our model and the insights from the Cumulative Proportion says that, altogether 3 PCs contribute to 95% of the information in the dataset. So, based upon the insights gained we can decide to proceed with 3 variables instead of the total number 5. Now, what are the 3 variables out of 5 has to be selected for the model? 

Rotation helps us to get the answer for this. 

Rotation will have the loadings of each variable under all the PCs. Here in our case, based upon the example we have considered, Rotation should have 5 PCs namely PC1, PC2, PC3, PC4, PC5. Select all the values in PC1 and sort it out in the decreasing order. Since we have already chosen to proceed with 3 variables, choose the corresponding variable represents the top 3 values out of 5.

Pretty simple, isn't it? But,I will never suggest that this is the proper approach to figure out the variables corresponding to the PCs. Based upon my rigorous research and practical experimentations, I would like to suggest you that this is one of the way to implement PCA effectively. But always remember that PCs in PCA does not corresponds to a particular variable, since PCs are the combination of 2 or more variables.

The above method might give a fair result for a Categorical Response Variable. For a Continuous Response Variable the following method can be considered.

Finalise the number of variables by using Cumulative Proportion (In our example, it is 3). Then, perform a Correlation for all the variables and take the results corresponding to the response  variable. In the obtained result, the value of the Response Variable will be equal to 1.00 and the value of other variables will range from -1.00 to 1.00. Sort out the values in the decreasing order irrespective of the positive and negative correlation. The variables corresponding to the top 3 values has to be selected for our model.

Note: This method gave me better results in my experiments. So I've shared it with you. But I would highly recommend you to experiment it and verify the accuracy before finalise with this. ( I would like to get some suggestions regarding this technique. So, experts please share your views through my personal Email ID - which I've mentioned in the bottom of this post.)

In a Machine Learning model, accuracy is always fluctuating. More the data, more is the accuracy.

Hope this might helped you. 

I'll always welcome and value your suggestions. So, please feel free to reach out to me. I'm reachable through the following links.

Email - kgfahath@gmail.com
LinkedIn - www.linkedin.com/in/fahath-kg
Twitter - https://twitter.com/kgfahath
Facebook - https://www.facebook.com/people/Fahath-KG/100010754347966

Popular posts from this blog

Structural Equation Modelling - A basic introduction.

An Introduction to Factor Analysis