Entering edit mode
6.9 years ago
statfa
▴
790
Hi
As you know, GC-content bias could arise in RNA-seq data if two or more samples are sequenced in each lane. How can I know how many samples are sequenced in each lane when I get the data from GEO datasets?
For example, look at this link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47944
Thank you
Not that I know of. Do you have a reference?
Yes, sure. Please read the read count normalization section here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917940/
See: Ajdust for GC content bias in RNA-seq DE analysis. Only if you see a need to in your data, then consider accounting for it. I am not sure number of samples and/or lanes have any correlation to GC bias.
Edit: I just noticed that the thread I linked is your own.
BTW: You may have to contact the data submitters to see if you can find which samples ran in which lane/in what combination.
Thank you very much. that's my post. Well, I read that in some papers. When only one sample was sequenced in each lane, they didn't bother about GC-content bias. I was hopeful that I could find the information about the number of samples in each lane in GEO datasets so that i didn't need to worry about that bias. DEseq2 corrects the bias. It does't check if the bias exists. The dataset I posted as an example is used by the paper I sent you the link. in the paper, the claim that in this data, there's only one sample sequenced in each lane, so they didn't normalize GC content bias. But nowhere could I find any info about the number of samples in each lane in GEO datasets.
I could be wrong but AFAIK that is not a required piece of information for GEO/SRA submissions.
As @Devon had suggested in your past thread, if you are worried about GC bias then you will have to test for it each time.