This is a good question and one you really need to understand to interpret your data.
The way I think about it is this:
If you are sampling something by a count, say the selection of blue marbles out of a bag of different colored marbles, the count for any given color of marble will fluctuate around the true value.
For example, if the true proportion of marbles that are blue in the bag is 4% and you select 100 marbles sometimes you will get 4, sometimes you will get 5 or 3. If you do that an infinite number of times and the bag is big enough the distribution you will get is a Poisson distribution. (This is the Poisson approximation to the binomial).
We use a Poisson distribution to describe RNA Seq data because we are selecting species of RNA out of a pool of RNA molecules. So instead of a blue and red marbles you have SUMO2 and BRCA1 RNA.
A poisson distribution is a one parameter distribution. The mean of the distribution is always equal to the variance. It can only describe the counting noise and nothing else.
However, counting noise won't be the only source of variance in your data. You will also have biological noise and technical artifact. Some genes may fluctuate more or less because of their inherent nature (e.g. heat shock genes have high variance in biological replicates because they bounce around if someone puts the heat up). If you measure the expression of these genes (and really all genes') in multiple replicates the variance of those measurements will be higher than the mean. So you don't want to use a Poisson distribution to describe that.
So what to do?
A negative binomial distribution can be thought of in lots of different ways, but one thing that happens is that if you sample from a gamma distribution using counting you get a negative binomial distribution. A gamma distribution kind of in the middle of a lognormal and a normal distribution.
So using a negative binomial distribution assumes that the noise from biological and technical variance is roughly described by a gamma distribution but then also accounts for the sampling noise.
I wrote this thing here that explains variance a little more and how to think about it. Maybe you'll find it useful.
http://gkno2.tumblr.com/post/24629975632/thinking-about-rna-seq-experimental-design-for
http://arxiv.org/abs/1104.3889 this paper has the answer in great detail, can get a bit technical but in my opinion is written very well.