Question

Aligning paired end fastq files dumped from SRA

9

Entering edit mode

10.0 years ago

Zev.Kronenberg 12k

Greetings,

I've downloaded a Short Read Archive (SRA) experiment and dumped it to fastq.

~/tools/sratoolkit.2.4.2-centos_linux64/bin/fastq-dump -I  --split-files --gzip SRR1514952/SRR1514952.sra

BWA mem is throwing and error when I'm aligning the mate pairs:

[mem_sam_pe] paired reads have different names: "SRR1514950.1.1", "SRR1514950.1.2"
[mem_sam_pe] paired reads have different names: "SRR1514950.2.1", "SRR1514950.2.2"
[mem_sam_pe] paired reads have different names: "SRR1514950.3.1", "SRR1514950.3.2"

I'm checking that the files aren't truncated and contain the same number of reads. Has anyone run into this problem before?

paired sra fastq bwa mem • 7.8k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Zev.Kronenberg 12k

Ram · Answer 1 · 2014-11-22

9

Entering edit mode

10.0 years ago

Zev.Kronenberg 12k

This seemed to work. Just need to ask for the original read format.

~/tools/sratoolkit.2.4.2-centos_linux64/bin/fastq-dump --origfmt -I --split-files --gzip SRR1514950/SRR1514950.sra

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Zev.Kronenberg 12k

score 3 · Answer 2 · 2014-11-22

it's probably because this isn't the default way paired reads are usually named so bwa is confused. Try a quick sed:

sed -i 's,.1,/1,g' file1 and sed -i 's,.2,/2,g' file2. You will however need to manually fix the first read in file1 and second read in file2 to

SRR1514950.1/1 and SRR1514950.2/2

Hope this works.

score 0 · Answer 3 · 2017-02-07

0

Entering edit mode

7.8 years ago

Christian ★ 3.1k

The following command worked for me:

cat sra.fq | perl -ne 's/\.([12]) /\/$1 /; print $_' > sra.fix.fq

ADD COMMENT • link 7.8 years ago by Christian ★ 3.1k

score 0 · Answer 4 · 2017-12-14

I had something similar. The reads I got from SRA look like so:

@SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
AACTTCCAGTGGAAATGAGATTCTGATTCTACCAAAAATGGCCCTCCGAATAGTCAGCATGTAGTTTGTTTGCCC
+SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
CCCFFFFFHHHHGIJIJIJJJJJJJJJJIJJJJJJIJJIJJIGIGIJJIIJIIIIIIJJJJIGIJJJIIJJJHHH

I tried something like this to make it compatible with BWA. It works with both forward and reverse files. I prefer to pipe (and zip) it to another file to keep the original as a backup.

sed 's;@SRR1531517\.\([0-9.]*\)\([0-9]\) \([a-zA-Z:0-9]*\) length=[0-9]*;@\3/\2;' sra.fq | gzip > sra.fix.fq.gz

Which gives me:

@D3NH4HQ1:58:D091WACXX:7:1101:1448:2140/1
AACTTCCAGTGGAAATGAGATTCTGATTCTACCAAAAATGGCCCTCCGAATAGTCAGCATGTAGTTTGTTTGCCC
+SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
CCCFFFFFHHHHGIJIJIJJJJJJJJJJIJJJJJJIJJIJJIGIGIJJIIJIIIIIIJJJJIGIJJJIIJJJHHH