Question

How to Process Multiple SRRs for the Same BioSample ?

0

Entering edit mode

5 months ago

omicon ▴ 40

Hello everyone,

I am working with data from PRJNA528920 and noticed that some BioSamples (SAMN) have multiple associated SRRs (Sequence Read Archive Runs). For example:

SAMN11249717 = SRR8782083
SAMN11249717 = SRR8782084
SAMN11249716 = SRR8782085
SAMN11249716 = SRR8782086

Additionally, I found a discrepancy between the number of samples reported in GSE128803 (which only lists 6 samples) and PRJNA528920, which contains 12 SRRs.

I read the associated paper but couldn’t find clear information about this. I also checked whether this could be related to the sequencing technology used (ION_TORRENT) but didn’t find any evidence suggesting so.

My questions are:

Do these SRRs correspond to independent sequencing runs meant to select the highest-quality one? For alignment and count table generation, should I use only the first SRR for each BioSample? Is it possible to merge them without introducing batch effects? I plan to use these data for my thesis, so I would really appreciate any guidance or experiences you can share on how to correctly process this type of data.

Thank you guys.

GEO RNA-seq NCBI miRNAs RNA • 700 views

ADD COMMENT • link updated 5 months ago by GenoMax 153k • written 5 months ago by omicon ▴ 40

score 0 · Answer 1 · 2025-03-17

0

Entering edit mode

5 months ago

GenoMax 153k

Looking at the metadata table for this project gives additional clues: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA528920&o=acc_s%3Aa

Looks like some samples were run on two separate runs. Since it is the same sample number these are technical/sequencing replicates. In that case they can be merged during processing. At the same time HCER_1,HCER_2 and HCER_3 (and HeLa_* samples) are likely biological replicates: https://www.ncbi.nlm.nih.gov/biosample?LinkName=bioproject_biosample_all&from_uid=528920

If there is an associated publication, you should refer to that for additional details.

ADD COMMENT • link 5 months ago by GenoMax 153k

0

Entering edit mode

Hi, thank you!

I’ve already read the paper and reviewed the metadata, but I couldn’t find any additional relevant information.

Yes, there are a total of 6 samples: HCER_1,2,3 and HeLa_1,2,3. I also think these are two separate runs from the same library, or at least that’s what it seems...

However, I’m concerned about merging the FASTQ files due to potential batch effects. Would it be better to process everything separately?

I mean, should I generate count tables for all SRRs (samples) individually and then merge the counts later?

ADD REPLY • link 5 months ago by omicon ▴ 40

0

Entering edit mode

I’m concerned about merging the FASTQ files due to potential batch effects. Would it be better to process everything separately?

You could do that but any variation you see will have no biological significance.

Technical replication of sequencing (for Illumina for sure, probably for Ion as well) generally shows minimal variation so the data can be merged before processing.

I mean, should I generate count tables for all SRRs (samples) individually and then merge the counts later?

That is no different than merging the data first and then aligning/counting. Each read is aligned independently.

ADD REPLY • link 5 months ago by GenoMax 153k