Entering edit mode
10.2 years ago
ilyco
▴
60
Hi,
What is the fastest way to get the distances between reads (irrespective of the strand they were aligned to) from a SAM/BAM alignment file?
Context: I aligned short RNA-seq reads using Bowtie and I would like to store the distances between reads on each chromosome so I can try some clustering method that would group the reads.
Thank you!
Are you trying to perform feature discovery (i.e., find unannotated transcripts)? There are programs already written that will do this (e.g. cufflinks), so you don't need to reinvent the wheel.
Regarding your actual question, you first have to define what you mean by distance. Are we just using the minimum distance between any two of their mapped bases (this is likely the case) or are do you only want the distance between a single given end of each alignment? Should there be a differentiation made between a complete and partial overlap? Do you really want the distance between all alignments, other only those within a given window?
I chose the best alignment for each read so I now have a set of putative positions for each read. By distance, I mean the number of bp between a read and the next one on the chromosome based on the genomic coordinates mentioned by the alignment. In the meantime I figured out that it is easier to convert the BAM file to the BED format and just use the coordinates there.