Entering edit mode
8.2 years ago
I would like to understand how the LN values found in the header are calculated. The sam specification says that it is the reference sequence length, but it doesn't seem to match. I have the following in a sam file:
@SQ SN:I LN:15072434
but that doesn't match the nucleotide count of the reference sequence:
gunzip -c Caenorhabditis_elegans.WBcel235.dna_rm.chromosome.I.fa.gz | grep -e ^[^\>] | wc -c
returns 15323642