I got an alignment-to-human obtained via BWA and I deduplicated the bam file. I would like to visualize the quality of the alignment with R in order to have complete control on the graphic.
I used samtools stats to get the basic information but how is the output formatted? I can see that
# Columns correspond to qualities and rows to cycles. First column is the cycle number.
but what are cycles? and do the columns represents the chromosomes of the reference genome? If yes, can I get the name of the chromosomes?
A cycle in Illumina sequencing is a base since in each synthesis cycle a new base is added during the sequencing-by-synthesis process. If you have like 1x50bp reads then you have 50 cycles = 50 bases. What you have there is a quality per base summary for every read in the bam file, from what I understand this is what fastqc collects in its base quality section (not using samtools but from the concept). For the quality of the alignment I would simply collect the MAPQ scores. Check the SAM specs for which column this is in the SAM file. Pretty sure they are tools that automate this.
Different aligners treat MAPQ differently so you will want to check on aligner specific implementation of MAPQ values.