Wednesday, May 8, 2013

An intuitive explanation of PCA (Principal Component Analysis)

Many research papers apply PCA (Principal Component Analysis) to their data and present results to readers without further explanation of the method. When people search on the internet for a definition of PCA, they sometimes get confused, often by terms like "covariance matrix", "eigenvectors" or "eigenvalues". It is not surprising because most explanatory articles focus on detailed calculation process instead of the basic idea of PCA. They are mathematically correct, yet often not intuitively readable at first glance.

For a mathematical method, I believe most people only need to understand the logic and limitations of it and let software packages to do the rest (implementation, calculation, etc.). Here I am  trying to explain that PCA is not an impenetrable soup of acronyms but a quite intuitive approach. It can make large-scale data "smaller" and easier to handle.

Tuesday, March 19, 2013

How much confidence can we obtain from a piece of evidence?

Every one knows that most computational predictions on biological systems are "less confidential" although many methods were based on convincible evidences and were announced of having high accuracy. But why these theoretically applicable methods failed to give reliable outputs? Some people believe that it's simply because existing methods are not good enough, or the biological system is too complex to predict. That's probably true. But there's another important factor that was often ignored, the abundance of potential true positives.

I will use a simple disease diagnosis example to show that even if we have a excellent prediction method based on strong evidences, we might still get poor predictions as long as the disease is rare in population.