When I got the read count table for a typical RNA-Seq experiment (see a subset below), I found that the read counts for the 3rd replicates (B3) in group B are consistently high relative to the other replicates (B1 and B2). I was told that the data preparations should be free of error, so I am just curious what might happen for such extreme counts for a particular replicate. Thanks!
A1 A2 A3 B1 B2 B3
12626 19794 17190 3668 4782 49020
5940 9357 8143 1681 2210 23238
5939 9355 8143 1681 2211 23238
8318 13113 11406 2365 3102 32556
How does the correlation between the samples look like? Is B3 an outlier?
@IdoTamir: actually there are only a few extreme counts in B3, which makes the corresponding genes differentially expressed (in B). The overall library sizes for the six replicates are comparable...
maybe its biology? cell-cycle genes, apoptosis genes ... because cells were more-or less dense than the others if its from cell culture. Maybe its preparation/PCR: length bias? gc-bias? UTR