Entering edit mode
8.2 years ago
gundalav
▴
380
I have paired-end reads R1 and R2.
|-----R1-----|----------------------------|-----R2-----|
A B C
Total insert here is A + B + C. How can I get this sequence from the BAM/SAM output?
The reads are mapped using classical Bowtie2 parameter.
bowtie2 -x myindex \
-1 R1.fastq \
-2 R2.fastq \
--no-discordant \
--no-mixed -p 4 \
--very-sensitive \
--no-mixed \
| samtools view -bS - > output bam
You can't really get that solely from a SAM/BAM file, since the sequence appropriate for
B
for a particular fragment is unknown. What is your actual biological goal?If we know the mapping location of A and C, we can get A+B+C there right? With Picard we can get the A+B+C length using
pathtopicard\CollectInsertSizeMetrics.jar
I wonder if there is any existing tool to actually extract the fragment from the genome. At the end of the day want to translate A+B+C into protein and derive MHClass-I core from there.It's unclear if you want the sequence of the fragment or the sequence of the reference genome covered by the fragment. For the latter, sure, that's easy enough with a small bit of scripting (I'd use pysam in python, but to each his/her own). For the former, see my earlier comment.