How to Process Multiple SRRs for the Same BioSample ?
1
0
Entering edit mode
2 days ago
omicon ▴ 40

Hello everyone,

I am working with data from PRJNA528920 and noticed that some BioSamples (SAMN) have multiple associated SRRs (Sequence Read Archive Runs). For example:

  1. SAMN11249717 = SRR8782083
  2. SAMN11249717 = SRR8782084
  3. SAMN11249716 = SRR8782085
  4. SAMN11249716 = SRR8782086

Additionally, I found a discrepancy between the number of samples reported in GSE128803 (which only lists 6 samples) and PRJNA528920, which contains 12 SRRs.

I read the associated paper but couldn’t find clear information about this. I also checked whether this could be related to the sequencing technology used (ION_TORRENT) but didn’t find any evidence suggesting so.

My questions are:

Do these SRRs correspond to independent sequencing runs meant to select the highest-quality one? For alignment and count table generation, should I use only the first SRR for each BioSample? Is it possible to merge them without introducing batch effects? I plan to use these data for my thesis, so I would really appreciate any guidance or experiences you can share on how to correctly process this type of data.

Thank you guys.

GEO RNA-seq NCBI miRNAs RNA • 182 views
ADD COMMENT
0
Entering edit mode
2 days ago
GenoMax 150k

Looking at the metadata table for this project gives additional clues: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA528920&o=acc_s%3Aa

Looks like some samples were run on two separate runs. Since it is the same sample number these are technical/sequencing replicates. In that case they can be merged during processing. At the same time HCER_1,HCER_2 and HCER_3 (and HeLa_* samples) are likely biological replicates: https://www.ncbi.nlm.nih.gov/biosample?LinkName=bioproject_biosample_all&from_uid=528920

If there is an associated publication, you should refer to that for additional details.

ADD COMMENT
0
Entering edit mode

Hi, thank you!

I’ve already read the paper and reviewed the metadata, but I couldn’t find any additional relevant information.

Yes, there are a total of 6 samples: HCER_1,2,3 and HeLa_1,2,3. I also think these are two separate runs from the same library, or at least that’s what it seems...

However, I’m concerned about merging the FASTQ files due to potential batch effects. Would it be better to process everything separately?

I mean, should I generate count tables for all SRRs (samples) individually and then merge the counts later?

ADD REPLY
0
Entering edit mode

I’m concerned about merging the FASTQ files due to potential batch effects. Would it be better to process everything separately?

You could do that but any variation you see will have no biological significance.

Technical replication of sequencing (for Illumina for sure, probably for Ion as well) generally shows minimal variation so the data can be merged before processing.

I mean, should I generate count tables for all SRRs (samples) individually and then merge the counts later?

That is no different than merging the data first and then aligning/counting. Each read is aligned independently.

ADD REPLY

Login before adding your answer.

Traffic: 2176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6