Hello, I'm working with paired end rna-seq data so two fastq files produced 1 BAM file. I know col 10 represent the seq and col 11 have its ascii code for quality. If the original seq length 100bp when I extracted some seq from BAM file I noticed some reads of length LESS than the original length. For example in one location one seq has cigar (40M, which means 40 bps matched) my question is: what happened to the rest of the seq (60bps)?? Another point, in this case we have paired end data which means two seq one from each end but BAM file has only one. This one included in BAM file to which end belongs??
Many thanks..
Did you map locally or end-to-end? Which aligner did you use?
end-to-end using STAR
Are you 100% sure? The CIGAR you reported is compatible with a local mapping. What is your command, can you post it here?
For what concerns the absence of the paired mate from the BAM: if you used an option to leave out unmapped scores, perhaps that's the reason.
To know which end it does correspond to, you have to see the bitwise flag. Reads on the reverse strand will have the 16 bit.
What is the bitwise flag for the paired read that doesn't have the mate in the file? Did you check it on https://broadinstitute.github.io/picard/explain-flags.html ?
Thanks Macspider; I confirmed trimmed happened prior the alignment, so this is solved now..
Then you can mark the thread as closed, so it doesn't show up in the "open threads" section. ;) Good luck!
Thank you, I'm still confused about some fields in BAM file..
Which ones? We can maybe help.
Have you checked page 5 of SAM Spec document?
I have posted my question here Count the fragments from BAM file Thanks in advance..