Convert BAM files to paired end FASTQ
1
0
Entering edit mode
5.8 years ago

I have a bunch of bam files aligned with paired-end fastqs and I need to convert them back to paired-end fastqs.

I am using "samtools fastq" for this purpose (after sorting bam files by name):

samtools fastq -1 output.pe_1.fastq -2 output.pe_2.fastq -s singleton.fastq input.bam

The problem is, I noticed that my fastq files have 2 different naming conventions:

  1. Case1:

read name in pe1: @UNC15-SN850_90:5:1101:1195:2138/1

read name in pe2: @UNC15-SN850_90:5:1101:1195:2138/2

  1. Case2:

read name in pe1: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 1:N:0:TTAGGC

read name in pe2: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 2:N:0:TTAGGC


For case 2, I get my paired-end fastq files correctly. However for case 1, all of the reads are pushed to the singleton.fastq and also samtools generate the 2 empty paired-end fastq files.

Is there a way I can smoothly run both cases correctly using "samtools fastq" or any other tools available?

Samtools BAM FASTQ • 7.1k views
ADD COMMENT
0
Entering edit mode

Is this TCGA data?

You can may be able to use reformat.sh from BBMap suite if all you need to do is to reformat the /1 id's to ones that contain :.

spaceslash=t            Put a space before the slash in addslash mode.
addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.
ADD REPLY
0
Entering edit mode

Hello,

could you please post complete example lines from the bam files, where one can see the read pairs?

Thanks!

fin swimmer

ADD REPLY
0
Entering edit mode
5.8 years ago
JC 13k

The problem could be Samtools is expecting the second header format (including index). You can try bedtools or Picard for this.

ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6