How many reads (percentage) would you expect to be exactly equal? And to what extent? I just came across an experiment where out from 100 x 10^6 reads, just 60 x 10^6 reads are unique.
In my case, it is rna-seq of a murine adipocyte cell line. Illumina HiSeq, paired end reads, read length = 100. Standard TruSeq protocol.
Actually I do think it is due to some error in the library preparation. Even if there are a couple of genes that are very high expressed and hence are the origin of most of the reads, the reads should not be necessarily exactly equal.
I just wonder what to do with such data. First of all it is important that one is aware that such things could happen. Furthermore, if some genes are so overrepresented, other genes might be heavily under-estimated. All together, just be aware that you probably don't have a fair representation of the mRNA landscape in your sample.
Moreover, one could reduce the computational cost of the mapping dramatically if one deals with such data and restrict the data to just unique reads.
What sequencing platform and libraries you are using?
And what species?
I agree with these comments, more info would be useful. Such as: source tissue, species, RNA isolation method, polyA+ selection?, library construction method, read length?, paired vs. unpaired?, etc.