How can I restore paired-end fastq files from a sorted bam file with fastq remain sorted?
1
0
Entering edit mode
7.0 years ago
lghust2011 ▴ 110

I have two paired-end FASQ files named fq1.fastq and fq2.fastq. Then I use BWA MEM to map these two files with the paired-end mode, like this:

bwa mem reference.fasta fq1.fastq fq2.fastq > result.sam

Then I transfer this sam file to bam format and sort it by coordinates with samtools:

samtools view result.sam > result.bam && samtools sort result.bam -o result_sort.bam

Now I want to restore these two fastq files from result_sort.bam. I know that I can use picard SamToFastq to do this. But reads in fastq files not remain sorted like the result_sort.bam. So, is there any other way that I can restore fastq files with reads sorted? I want to do this, because when I get fq1.fastq and fq2.fastq from result_sort.bam and use BWA to map then I again, I can get a sam file within that many reads are sorted. Any reply will be much appreciated!

alignment sequence • 3.6k views
ADD COMMENT
0
Entering edit mode

I want to do this, because when I get fq1.fastq and fq2.fastq from result_sort.bam and use BWA to map then I again, I can get a sam file within that many reads are sorted

samtools sort accept also a sam file as input and can write to sam file. But why do you need a sam file instead of bam?

fin swimmer

ADD REPLY
0
Entering edit mode

Either sam or bam format is OK for me

ADD REPLY
0
Entering edit mode

May I ask why you need this?

ADD REPLY
0
Entering edit mode

I want to get the sorted fastq files, so I can get the sorted sam file directly when I run BWA MEM with sorted fastq files. But it seems very difficult!

ADD REPLY
0
Entering edit mode

Ok, what about reads getting assigned to random positions due to poor mapping quality, they will not end up in the same position? And to re-iterate my initial question, may I ask why you need this?

ADD REPLY
2
Entering edit mode
7.0 years ago

You can't do that !

In bwa manual, it is stated that :

bwa mem [...] db.prefix reads.fq [mates.fq]
[...] If mates.fq file is absent and option -p is not set, this command regards input reads are single-end. If mates.fq is present, this command assumes the i-th read in reads.fq and the i-th read in mates.fq constitute a read pair.

Meaning that your read pairs must be sorted by name in the fastq files inputed in bwa mem, not by coordinates.


EDIT : To clarify, the OP's idea of keeping the order of reads extracted from a coordinate-sorted bam file could work, but only with single-end reads. I guess that is goal is to avoid sorting the bam file after re-mapping (why would he want to re-map is another question) and saving some time. However, as stated above, aligners such as bwa-mem require the read pairs to correspond in the fastq.1 and fastq.2 files, making the coordinate-sorting inappropriate.

ADD COMMENT
0
Entering edit mode

Thanks for your answer!

ADD REPLY

Login before adding your answer.

Traffic: 2984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6