Question

Convert BAM files to paired end FASTQ

0

Entering edit mode

5.8 years ago

berke.toptas • 0

I have a bunch of bam files aligned with paired-end fastqs and I need to convert them back to paired-end fastqs.

I am using "samtools fastq" for this purpose (after sorting bam files by name):

samtools fastq -1 output.pe_1.fastq -2 output.pe_2.fastq -s singleton.fastq input.bam

The problem is, I noticed that my fastq files have 2 different naming conventions:

Case1:

read name in pe1: @UNC15-SN850_90:5:1101:1195:2138/1

read name in pe2: @UNC15-SN850_90:5:1101:1195:2138/2

Case2:

read name in pe1: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 1:N:0:TTAGGC

read name in pe2: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 2:N:0:TTAGGC

For case 2, I get my paired-end fastq files correctly. However for case 1, all of the reads are pushed to the singleton.fastq and also samtools generate the 2 empty paired-end fastq files.

Is there a way I can smoothly run both cases correctly using "samtools fastq" or any other tools available?

Samtools BAM FASTQ • 7.1k views

ADD COMMENT • link updated 5.8 years ago by JC 13k • written 5.8 years ago by berke.toptas • 0

0

Entering edit mode

Is this TCGA data?

You can may be able to use reformat.sh from BBMap suite if all you need to do is to reformat the /1 id's to ones that contain :.

spaceslash=t            Put a space before the slash in addslash mode.
addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.

ADD REPLY • link 5.8 years ago by GenoMax 147k

0

Entering edit mode

Hello,

could you please post complete example lines from the bam files, where one can see the read pairs?

Thanks!

fin swimmer

ADD REPLY • link 5.8 years ago by finswimmer 16k

score 0 · Answer 1 · 2019-02-21

0

Entering edit mode

5.8 years ago

JC 13k

The problem could be Samtools is expecting the second header format (including index). You can try bedtools or Picard for this.

ADD COMMENT • link 5.8 years ago by JC 13k