with a biological example?

Can you perhaps be a bit more verbose with your request ballyhoo?
okay, how does principal component analysis work? Give me an example applied to biology, ecology, etc. I read the wikipedia article and I had a general idea about it but something I read recently seemed somewhat misapplied. So, I need a fresh example from someone who understands it's application.
At a superficial level, PCA works by rotating your measurement axis in such away that you have no covariance between different measurements when measured under the new axis. How much more in depth do you want to go?
Why would you want to do that though? I thoughtexperiments want to show a relationship or no relationship or partial relationship between variables. Why would you want to use a statistical technique that forces or finds a set of coordinates that shows there is no covariance between the data points in those circumstances? Is that accurate? Is the point to show that those circumstances particularly which are reflected in the coordinate axes are when there is no covariance?
The typical reason is because the way we like to group things together for convenience or personal bias may not be the most relevant way to understand to the phenomena being studied.
Here is an very manufactured example: Lets suppose we want to study the growing patterns of a field of flowers. To that end we scatter a whole host of ACME weather measurement devices in the field and every day come back and count the number of flowers vs what our weather devices measured for the previous day. Now here is the rub, each of these weather devices measures local rainfall  so we have many local rainfall measurements. But in this manufactured example I propose that the total rainfall of the field is more important than the individual measurements separately. PCA would show us this, one of the principle components with a lot of explanatory power (i.e. large eigenvalue) would be the sum of all the individual rain measurements. The next way may be the sum of their alternating sum  as some measure of the local variably between the patches in the field, followed by a principle component that measures the total amount of sun light, as opposed to local sunlight.
I found this explanation to be very helpful especially the computer graphics at the latter time of part one.
https://www.youtube.com/watch?annota...;v=UUxIXU_Ob6E
https://www.youtube.com/watch?v=sRsd...feature=relmfu
edit  it doesn't seem to embed for some reason.
I'm curious though how they handle more than three variables? Say there were five variables. Would you split them into a group of three and a group of two then do pca on each then combine the two pca's?
Last edited by ballyhoo; August 24th, 2012 at 05:22 PM.
You can use as many variables as you want  the maths just gets more painful in that finding the eigenvalues becomes trickier as your matrix dimension gets bigger. The same recipe works though:
Make more sense?
 For each variable, work out the mean and subtract it from the readings. \
 Calculate the covariance matrix. If you have 10 variables this will be a 10x10 matrix
 Diagonalise the covariance matrix. We get 10 principle components and 10 eigenvalues.
I have some explanations of pca that say reducing the number of dimensions causes some information loss. But does more variables and dimensions add information or obscure relevant information? Does the loss of information caused by reducing the dimensions exceed the value of more variables increasing the information content? Or is any of that accurate to say?
I think we have to be careful here, what do you mean by reduce dimensions? As in set certain eigenvalues below some threshold to zero or reduce the number of explanatory variables?
"The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, ie. by reducing the number of dimensions, without much loss of information." http://www.sccg.sk/~haladova/principal_components.pdf
Ah ok, so we are doing the "set eigenvalues to zero" dance. In this case it is a signal to noise type calculation and you have to use your gut (or some sort of information criteria) to decide what is real signal and what is noise. But it tends to be a subjective call and thus more the art side of mathematical modelling vs the science side. So to answer your question: it depends on the data and the problem. The recipe ends once you have done the PCA decomposition  after that you have to use your experience and domain knowledge.
« Sets  Inverse of a Hilbert Curve » 