Entering edit mode
7.6 years ago
viv3kanand
▴
10
Hi,
I was trying to align an RNA-Seq data (paired-end) using bwa. For the alignment summary I used,
samtools flagstat alignment.bam
bamtools stats -in alignment.bam
And both gave me similar results (For example number of reads in pair 1).
Result from Samtools
119825 + 0 read1
Result from bamtools
Read 1: 119825
But when I tried to double check the results using converted bed file using awk, I get a different result.
bamToBed -i alignment.bam > alignment.bed
awk -F "\t" '{print $4}' alignment.bed |awk -F "/" '{if($2==1) print $1}'|sort|uniq|wc -l
119532
Similarly all results are different when I check from bed file. What could be the possible reason for this?
You have to provide the conversion command from bam to bed.
Added conversion command.
I have no ready explanation, but you could try to see if your awk pipe is working properly by reconverting the bed file to to bam with
bedtools bedtobam
:http://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html
After re-conversion, check again with flagstat. If you get the same number in the new bam file as the
awk
pipeline on the bed file then this means that that pipe is for some reason not working 100% correctly. It might be: sam format is highly informative and information-dense, while bed format is less. Therefore you might be losing some info that samtools is using to determine the mapped reads.To convert from bed to bam I need the genome file right? I was using a viral reference fasta file. I'm not sure how I would create a genome file from that. eg:
I've figured out to how create genome file. Thanks
As you said, there is something wrong with the summary now,
And with bamtools
samtools flagstat
will always be correct if the BAM file itself is correct, why are you bothering to "double check" that? You just have to ask yourself whether it makes sense to figure out whyawk
with a BED file is producing the wrong results (you probably have something better to do).Hi Devon,
You are correct. It was not just about double checking the results. I was trying myself to traceback read IDs (of both pairs) from the mapped reads. So that I can check with the barcode list and get to know how many reads mapped to genome has barcode in it. I thought bed file would do it. But it turns out to be giving me wrong results. Would you please suggest me a better option to do this using sam or bam file? (Specifically, is there a flag in sam or bam file to detect pair 1 and pair 2)
PS: I wasn't able to send respond yesterday. Forum was redirecting the page to home whenever I tried.
I guess it's not clear to me exactly what your were trying to do. Usually there's one barcode per sample, so everything has the same one.
Anyway, yes, there's a flag for read 1 (64) and another for read 2 (128).
I was looking for flag. Thank you Devon.