Dear all,
I am trying to combine two sam file together and subject to bedtools to convert it into bedpe file.
The two sam file are generated by mapping a pair-end read separately to the genome using bowtie2(-U -local).
Here are the code that I use:
# For read 1
samtools view -@ 16 -bT hg19.fa R1.sam > R1.bam
samtools sort -@ 16 -n R1.bam R1_sorted
# Do the same thing with read 2
...
# Merge two bam file
samtools merge -@ 16 -n R1R2_sorted_merged.bam R1_sorted.bam R2_sorted.bam
samtools sort -@ 16 -n R1R2_sorted_merged.bam R1R2_sorted_merged.sorted
samtools fixmate R1R2_sorted_merged.sorted.bam R1R2_sorted_merged.sorted.fixed.bam
# Convert bam to bedpe
# Command 1:
samtools view -@ 16 -bf 0x2 R1R2_sorted_merged.sorted.fixed.bam | bedtools bamtobed -i stdin -bedpe > R1R2_sorted_merged.sorted.fixed.bedpe
# Command 2:
bedtools bamtobed -i R1R2_sorted_merged.sorted.fixed.bam -bedpe > R1R2_sorted_merged.sorted.fixed.bedpe
Merging two bam file is inspired by James.
The Command 1 is from bedtools usage page which can "only convert properly-paired (FLAG == 0x2) reads".
But the output of bedpe file of Command is actually empty. Then I went double check the output of samtools view -@ 16 -bf 0x2 R1R2_sorted_merged.sorted.fixed.bam
. It turned out that it is actually empty, which indicate that there are no "properly-pared" reads in the .sorted.fixed.bam file.
Then I tried Command 2, the ouput bedpe file is not empty but the number of lines(< 0.5million) is quite small compared to the raw reads. There is no replicate too, which is not possible.
Another downstream I am gonna do is to remove duplicate from bedpe file and convert the bedpe file into bed file. Is there any way to convert bedpe to bed?
Anyone got a hint on what might be the problem?
FYI: You might also refer to my previous post on this question
Thanks a lot!!!
EDIT:
I think the problem might lie in the bam file. R1.bam samtools flagstat output:
299344083 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
295354894 + 0 mapped (98.67%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
R1R2_sorted_merged.bam samtools flagstat output:
603889264 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
590620124 + 0 mapped (97.80%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
R1R2_sorted_merged.sorted.bam samtools flagstat output:
603889264 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
590620124 + 0 mapped (97.80%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
R1R2_sorted_merged.sorted.fixed.bam samtools flagstat output:
603889264 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
590620124 + 0 mapped (97.80%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Still don't know what is going on. :(
Your mapping file is not recognizing the paired-end reads itself, can you check if they are the correct files and if the naming of reads is proper. Also mapping each-end of the read seperately and merging them into a single file doesn't make them paired together (since I see a R1-mapped file in your example). You need to map the paired-end data in the same run itself. Fixmate doesn't fix the flags, refer here
The problem is that these are ChIA-PET reads, which the two tags might come from different chromosomes. If I map as paired-end, this kind of interaction would be neglected. Also, for non-paired-end mapping, is the flagstat output looks like what I got for R1-mapped file above?