Question

Same GEO Accession, different SRR number, how to download this RNA-seq paired-end data?

0

Entering edit mode

2.0 years ago

ev97 ▴ 40

I am trying to download some public RNA-seq data (paired-end) and I have encountered that there are some samples that have the same GEO Accession but different SRR number (and different sizes). Therefore, when I download them using sra-toolkit and fastq-dump --split-3 I have several files for the same sample.

As you can see in the following screenshot, there are some samples that have different SRR number and GEO_Accession. However, as I said, there also some (highlighted) that have same GEO_Accession, different size and different SRR number.

When I use fastq-dump --split-3 for these samples (for example):

a) SRR7774397, I get:

SRR7774397_1.fastq
SRR7774397_2.fastq

b) SRR7774398, I get:

SRR7774398_1.fastq
SRR7774398_2.fastq

If you go the NCBI (Run Browser), they appear as two fastq files (_1 and _2):

SRR7774397 (Data access)

SRR7774398 (Data access)

However, theoretically they belong to the same sample...

How do you usually download this type of data? It seems that the data for some samples is splited but I do not know how to merge them or in general download them.

SRA Run Selector where all the samples appear can be found here (PRJNA488803)

Any help is really appreciated.

Thanks very much in advance

Regards

sra-toolkit fastq RNA-seq SRA • 1.6k views

ADD COMMENT • link updated 2.0 years ago by GenoMax 151k • written 2.0 years ago by ev97 ▴ 40

1

Entering edit mode

My guess is that some samples were resequenced. I'd recommend just merging the respective R1 and R2s and treat them as one sample.

ADD REPLY • link 2.0 years ago by Trivas ★ 1.9k

0

Entering edit mode

Thanks for your quick reply! How would you merge them?

ADD REPLY • link 2.0 years ago by ev97 ▴ 40

1

Entering edit mode

You can cat the respective R1/R2 files in same order e.g. cat Run1_R1.fq.gz Run2_R1.fq.gz > R1.fq.gz.

ADD REPLY • link 2.0 years ago by GenoMax 151k