Hey everyone,
I have a question regarding the way to handle Illumina single-end RNA-seq data. In fact, I have a few samples, each having 2 biological replicates. Nothing extraordinary so far except the case of one sample. Indeed, one of the replicates generated a low amount of reads (~13 millions) whereas the average number of reads I have for all the other cases is the double (27-30 millions). So, GATC reran this particular sample, but in a very weird way -- my guess is, alone in one lane -- which outcomes 150 millions reads...
This is of course too huge and thus, introduces a discrepancy in the data. I am running out of ideas how to handle it, so I'd really appreciate if someone could help.
Thanks a lot in advance :)
Why not work with 20% of your new data? Assuming that there is no specific bias in your new run, it doesn't seem a bad idea.
Strong assumption! ;-)
Thanks for your suggestion. Hum, if you pick them at random, this should be kind of acceptable. I thought of picking ~15 millions by chance, but it sounded very arbitrary...