Question

merged paried end RNASeq

0

Entering edit mode

7.8 years ago

nazaninhoseinkhan ▴ 530

Dear all,

I want to run an RNASeq analysis.

In the description of the experiment in SRA/NCBI it has been mentioned that the constructed libraries are paired.

But when I want to download the files, it seems that there is only one file for each sample.

How can I be sure that both paired read files have been merged in to one?

I will appreciate any advice

Nazanin

rna-seq paired reads merged • 2.3k views

ADD COMMENT • link 7.8 years ago by nazaninhoseinkhan ▴ 530

1

Entering edit mode

It's more likely that the SRA archive was processed incorrectly, which happens a lot. But typically, you should be able to tell whether the reads were merged based on the quality profile and read lengths. Read lengths will typically fall in some kind of bell-curve distribution if the reads were merged, and the quality scores should abruptly increase in the middle of the read and be otherwise roughly symmetrical. I think it would be unusual (and a bad idea) for people to merge reads prior to submission.

Can you print the SRA command you used to decompress the archive? With the right command, an archive that was made from 2 files should produce 2 files, or interleaved reads.

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

0

Entering edit mode

7.8 years ago

Petr Ponomarenko ★ 2.8k

Do you mean that after using fastq-dump from https://ncbi.github.io/sra-tools/fastq-dump.html with option --split-files and --filter-technical you get one file for paired read experiment? You should get two files with I1 and I2 appended to the name for a paired end submitionCould you please provide SRA ID of this archive? After downloading use FastQC and look at the per position quality. It has to have a distribution where begining and end of read apears of a bit lower quality and with greater standart deviation. Also for the forward read (I2 in file name) quality usualy is of less quality than for the forward (I1 in file name).

ADD COMMENT • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

7.8 years ago

nazaninhoseinkhan ▴ 530

Hi,

Thank you for your response.

In fact I wanted to download SRA files of an experiment from NCBI/SRA.

In the description of the experiment it has been said that the libraries are paired.

But when I wanted to download files, The names and sizes were completely different, so I could not deduce that they are paired.

I guess the sizes of the paired end reads must be the same and the names, as you said, must be the same with _1 and _2 append.

The accession of one the samples I want to download is: SRR585570

Can you please check it for me?

Regards

Nazanin

ADD COMMENT • link 7.8 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

Here, for example, I see two paired fastq files for that experiment:

ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA059/SRA059423/SRX193413/

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you so much

I had only searched NCBI/SRA.

ADD REPLY • link 7.8 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

7.8 years ago

Petr Ponomarenko ★ 2.8k

Forward Forward

Reverse Reverse

ADD COMMENT • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

7.8 years ago

nazaninhoseinkhan ▴ 530

Thank you so much for the time you put in to solving my problem.

I will use the split command to split the file in two separated file

ADD COMMENT • link 7.8 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

You are welcome. Please use ADD COMMENT to keep things organized when you are not giving an answer to the topic. Also, there are the bookmark, upvote and accept buttons available to you, so you can close the question and others will know that you found the answer. Thank you

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

score 3 · Accepted Answer · 2017-02-20

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR585570 using SRA toolkit fastq-dump https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump with -I for paired end and --split-files options.

fastq-dump -I --split-files -O /RAID6/home/petr/ SRR585570

you may use prefetch from SRA toolkit first (less download if you want to try different parameters and tools in SRA toolkit.

I downloaded files. They are normal paired-end files with the same number of reads. and used FastQC. Quality distribution plots look as they should. My guess is that when you downloaded the files last time it terminated prematurely giving you corrupted data.

Same files can also be downloaded from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA059/SRA059423/SRX193413 from http://sra.dbcls.jp/search/view/SRP016059 from National Institute of Genetics in Japan http://dbcls.rois.ac.jp/en (just to validate your results).