I have a bunch of bam files aligned with paired-end fastqs and I need to convert them back to paired-end fastqs.
I am using "samtools fastq" for this purpose (after sorting bam files by name):
samtools fastq -1 output.pe_1.fastq -2 output.pe_2.fastq -s singleton.fastq input.bam
The problem is, I noticed that my fastq files have 2 different naming conventions:
- Case1:
read name in pe1: @UNC15-SN850_90:5:1101:1195:2138/1
read name in pe2: @UNC15-SN850_90:5:1101:1195:2138/2
- Case2:
read name in pe1: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 1:N:0:TTAGGC
read name in pe2: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 2:N:0:TTAGGC
For case 2, I get my paired-end fastq files correctly. However for case 1, all of the reads are pushed to the singleton.fastq and also samtools generate the 2 empty paired-end fastq files.
Is there a way I can smoothly run both cases correctly using "samtools fastq" or any other tools available?
Is this TCGA data?
You can may be able to use
reformat.sh
from BBMap suite if all you need to do is to reformat the/1
id's to ones that contain:
.Hello,
could you please post complete example lines from the bam files, where one can see the read pairs?
Thanks!
fin swimmer