Entering edit mode
23 months ago
octpus616
▴
120
HI,
I want to get following fields from a bam file:
- Read_length
- Alignment_length
- Edit_distance
I known I can get this fields from pysam
, But I want to do this step with shell, I have the following scheme
- Read_length use
awk
calculate length of reads - Alignment_length use awk calculate MD fields?
- Edit_distance get from NM fields
I'm not sure if my understanding of Alignment length is correct, how should I get this information correctly?
for example, if I have a MD field:
10A3T0T10
So is the alignment length is
10+1(A)+3+1(T)+0+1(T)+10
but if we have
10A^3T0T10
here is also alignment length is 10+1(A)+3+1(T)+0+1(T)+10
or 10+1(A)+1(^)+3+1(T)+0+1(T)+10
, The read.query_alignment_length
in pysam
output as same as previous
update:
Can I use read.query_alignment_length
in pysam to get alignment length
?
I don't think you can get these fields from a bam with the shell. Bam files are in binary format and therefore not readable for humans :D
I think may
samtools view
can easily convertbam
to readablesam
format to stdout and use pipeline to awk or samtools is okTrue you can just use the
--header-only
flag and then you have all the info you need in a human readable format.why reinventing the wheel with bash when you have numerous API (pysam, htsjdk, htslib... ) available.
Because I find a little difference between pysam
read.query_alignment_length
and I calculate from MD filed, see my updated case for10A^3T0T10
MD is not needed to calculate length of alignment, only CIGAR (which tells you length of the aligned segment for example)