Problem in processing SRA file from NCBI
2
0
Entering edit mode
7.1 years ago
majeedaasim ▴ 60

I downloaded the SRA file from NCBI for my organism of interest. It is Illumina sequenced paired end RNA data. Normally to create an assembly forward and reverse reads are required by Trinity. However the downloaded file has no separate fprward and reverse reads. It appears to be a merged file. The _1 and _2 suffixes suggest that.

I wonder how can I split the forward and reverse reads if it is merged. Is there any other way to get such data.

Thanks

SRA ncbi • 2.2k views
ADD COMMENT
1
Entering edit mode

How did you process the SRA file? Did you use the --split-files and -F options with fastq-dump to split the two read files and recover original Illumina fastq headers? Post the SRA # if you want someone to check on it.

ADD REPLY
0
Entering edit mode

I have not processed it yet. I just downloaded the SRA file through galaxy. On viewing the file it looks like this

@FCC0MWCACXX:3:1101:1249:2088/1
NATCCGCCTAAGGAGGGGCTCACGTCTGATTAGCTAGTTGGTGAGGCAATGGCTTACCAAGGCTCCGATCAGTAGTTGGTCTGAGAGGAT
+
#4=DFFFFHHHHHIJJJJJJJJJJGHIJJJJIJJJIJGIIIFGIIDHIJIJJJIIGHHHHDFFFCDDDBDDDDDDFDDCBCDCCDDDDBD
@FCC0MWCACXX:3:1101:1249:2088/2
CACGCGGCATTGCTCCGTCAGGCTTTCGCCCATTGCGGAAAATTCCTCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCA
+
CCCFFFFFHGHHHJJJIHIJJJJIJJJJJJJJIJJJJJGJHHEHFFFFFFEEEEEEDCDDDDBDDDDD<CDDDDDDDBBBDDDEDDDDDD
@FCC0MWCACXX:3:1101:1175:2193/1
CAAATTCAAACCGCGCAGGAAGTCCCTCTTCCAACAATGCTGGACTCGCCCCTAATGCCGCTACTCCTCAGCCAGAGTTAAATCACAATA
+
CCCFFFFFHGHHHIIIJJIIJEDHIJFJIGIIJJJIJIGIJJJJJIGGIIGHFFFFFFFDDDDDCDDDDCACDDDDD>ACDDEDDDDDDD
@FCC0MWCACXX:3:1101:1175:2193/2
CTGATTTTTTGTCAAAAAACTCCGGCAGAAATCTCTTCTCTGTGCTATGAATTTTGTCCCAAGAAAACCACTTCGAATAGCTGGGGATTT
+
C@CFFFFFHHHHHGIIIJIJGIIGGIIHGGGDHGIJJIIJIIFHHIIJIGGGIIJJHEHHGHEFFFEEDDDDDDB=<>?CDDCDDDDDDD
@FCC0MWCACXX:3:1101:1344:2132/1
CTTTTGGCATTTAATTTATGTATGGTTATTTAATTTTTTTTGTAGCTGACTTGTGGCTCAACATATATTTATTGTAAAGGTTTTAATTTA
+
@CCFDF@DBBFHDGHGIGI>AIGGIFHGIIIGIIIGIIIIIIEFD@GCGHFHICGGEHIIIHHHFGDCDEFFFDE6;;AC@CCBCCCEEC
@FCC0MWCACXX:3:1101:1344:2132/2
ADD REPLY
0
Entering edit mode

If you are limited to working in Galaxy then I don't know the option you should use off the top of my head but make sure to choose split-files if that is available. Otherwise this is simple to take care of using reformat.sh from BBMap suite but that will have to be done on the command line. reformat.sh in=SRA.fq out1=R1.fq out2=R2.fq

ADD REPLY
0
Entering edit mode
7.1 years ago

The --split-files argument to fastq-dump is needed. That should produce two separate files if, indeed, the original sequencing was paired-end.

ADD COMMENT
0
Entering edit mode
6.2 years ago
gtrwst9 • 0

fastq-dump --split-3 SRR...

This does NOT redownload everything.

ADD COMMENT

Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6