Hello,
I have a question regarding the length information of reads obtained from BAM files. I have converted BAM files into BED files and kept the read sequence. So, it looks something like this:
Chr 6791 7891 TCGAATATCAGGGTGCCCTCTGGCAAGGGCTTGCCCAGCGTACGTCAC -
Chr 6966 7304 ATTGATGAGGGATGTGGGTGGATGGATGATGATGGAAATATGATATGC +
I always assumed that columns 2 and 3 provide information on the start and end positions of the read alignment. So, column3 - column2 is the read length. However, if I calculate the number of characters in the DNA string (column 4) with function nchar() in R, I get a different value.
Can anyone explain what I am missing?
Thank you!
Thank you! I do understand why read length may be larger than alignment length. But I still do not understand how sometimes the alignment length can be larger than the read length. Can you explain this further?
Here a simple example of a deletion in the read compared to the reference that makes the alignment two bp larger than the read length, as start and end of the alignment define the coordinates.
That was very well simplified! Thanks!