Hi all!
I am mapping a paired-end read separately using bowtie2.
After that, I want to combine the two bam file into one for downstream analysis.
How to properly do this combination?
I tried:
samtools sort -n R1.bam R1.sorted
samtools sort -n R1.bam R1.sorted
samtools merge R1_R2.bam R1.sorted.bam R2.sorted.bam
But it seems not working. The header of the R1_R2.bam is like following:
603889264 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
590620124 + 0 mapped (97.80%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Which shows that the alignment of R1 and R2 are no properly paired. Also there is no @RG header line in the combined file.
What might be the problem?
Thanks a lot!!!
Why do you map paired-end reads separately with bowtie2? I think your problem is in here...
Since my reads may contain PETs that span quite a long distance(even on different chromosomes), I would be better to map them separately. I was told that the ones with a long distance may be neglected by bowtie2.
Please avoid using abbreviations like "PETs", not everyone is equally aware of what it means (and google is obviously not helpful for this). I don't think your solution for solving this is optimal, and merging bam files of separate reads will not magically generate read pairs.
PETs are animals in your house, right? :P
Only if those animals are made of polyethylene terephthalate which regularly perform positron emission tomography.
I suspect that these are ChIA-PETs, which are technically plants...
Hi, have a look at http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#aligning-pairs.
Yes, I read this before. But I think this is mapping paired-end to the genome in a single run. However, I need to map them separately to the genome, so this may not be helpful in this situation.
Would you mind elaborating why that is so?
The point is that the paired-end reads that I got might cross a very very long distance along the chromosome, and even the two tag of the paired-end reads are on different chromosomes.
If I subject the two fastq file and run bowtie2 in paired-edn mode, some reads might not be mapped to the genome when the distance of the reads are way too different from the majority of the reads.
Those are the information that I need, so I can not afford to loose them.
If you use a splice aware software and set a long splicing distance then you should be able to achieve mapping without problems. For those reads that do not map in initial round, you could collect and map individually to genome again to see if you have real evidence of translocations by number of mapped reads that support an observation.