Principal component analysis (PCA) is a method for finding "directions" from multivariate data that have maximal variance. The first principal component (PC) explains most of the variance in the analyzed dataset, the second PC explains second most variance, and so on. If you had just two variables, and made a scatter plot of them, the first PC would point to the same direction as the major axis of an ellipse fitted around the data. The second PC would point to the same direction as the minor axis of the ellipse. In principle, you can think of fitting an N-dimensional "ellipse" and looking for the directions of its axes.
Anyway, it is a long way to reach the intuitive explanation above. To really understand what PCA does, one has to first understand what eigenvalues and eigenvectors of a matrix are. If you have not studied linear algebra for a while, I would suggest watching the excellent lectures by the MIT professor Gilbert Strang from MIT OpenCourseWare (http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/).
The eigenvalues of some matrix A are the roots of its characteristic polynomial. Everyone knows how to find the roots of a quadratic equation, so studying 2 x 2 matrices is quite easy. I have written a brief example below.
Once the eigenvalues are known, the corresponding eigenvectors can be found using the first equation.
So, what is the connection between eigenvalues and eigenvectors of a matrix with PCA? Since PCA was about finding directions of maximal variance, we should be probably analyzing some special matrix. That special matrix is the covariance matrix of your original dataset. By finding its eigenvalues and eigenvectors, you find the principal components.
Finally, I should probably add a few words about why PCA is an interesting technique. Suppose you have a multivariate dataset consisting of hundreds or thousands of variables. Is all of that information relevant? Probably not. So, one could find the first k principal components (of N), which explain almost all of the variance in the dataset (e.g. 95% or 99%). The rest principal components N-k could be discarded as noise. This procedure (known as dimension reduction) works with some data but not all. Sometimes the little-varying components could be interesting. Anyway, you get a new view of the data you have at hand.
Hi Steve, could you please clarify what you mean by PCA? I guess you refer to the principal component analysis but wanted to make sure before writing an answer. (Is it possible to type LaTeX into answers?)
Yes, I am referring to principal component analysis. My apologies for the confusion.