Hello,
I have aligned reads against a reference genome, I have now extracted the unique reads in a bam file, I wanted to ask if there is a program that allows me to extract the name of the read, the start and stop position, as well as the strandedness and the chromosome against which the read aligns?
thanks
Please read the SAM specifications. SAM/BAM is essentially a tab-delimited format, and the documentation will tell you in which columns you find the information. Convert BAM to SAM with
samtools view
and use any Unix tool (awk, sed, cut) to then select the relevant columns. Can be done via a pipe efficiently.the problem is the stop coordinate, it is not all that easy to get that information
it is one of those endemic bioinformatics things where a commonly needed information like start/stop of an alignment ought to be present ... but it isn't there
BTW I have asked the ChatGPT and lo behold it incorrectly claims that the following works:
but the answer is not correct,
the stop coordinate can only be computed by parsing the CIGAR string and moving the position for every
M
andD
operator but not onS
,H
orI
operators