Aligner with statistics included?
1
0
Entering edit mode
14 months ago

Hello,

Is there a long range aligner that could make a statistics outlook listing how many SNP, indels etc are observed in the alignment itself?

Thank you

aligner alignment • 1.4k views
ADD COMMENT
0
Entering edit mode

What is a "long range" aligner?

ADD REPLY
0
Entering edit mode

something like MAFFT, Clustal, Muscle etc and that work on fasta files. Not Bowtie or BWA, which are needed for the short-read fastqfiles

ADD REPLY
0
Entering edit mode

FYI aligners take reads and align to a reference genome, returning the position of the read and the a few other statistics. They are not interested in variant calling per se but how many differences between the read and the reference. If you are interested in SNPs, INDELS, etc you need to use a variant caller. Variant callers look at the differences between the reads and the reference to call variants.

I think by 'long-range' you're referring to long-reads?

ADD REPLY
0
Entering edit mode

Yes, long-reads. I understand the point, but I would like to know what is the identity between two or more sequences, how many indels or snp are presents.

ADD REPLY
2
Entering edit mode
14 months ago

BBTools' callvariants.sh calls variants only from the alignments. So:

callvariants.sh in=nxyxx.bam ref=ref.fa

yields:

50 of 342488 variants passed primary filters (0.0146%).

Type            Count   Rate    AD      Depth   AF      Score   Qual
Substitutions:  39      78.0%   2.5     3.3     0.807   22.6    36.4
Deletions:      5       10.0%   3.2     5.2     0.693   22.0    34.6
Insertions:     6       12.0%   3.0     5.2     0.694   22.8    37.0
Variation Rate: 1/87048
Homozygous:     50      100.0%

Does that look like what you wanted? You can add the clearfilters flag to get stats on ALL observed variants (including the ones that are likely sequencing error). Alternatively, during alignment, BBMap provides some of these statistics:

Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  83.3029%         3316002        83.6085%          488906265
unambiguous:             82.2136%         3272637        82.5206%          482544605
ambiguous:                1.0894%           43365         1.0879%            6361660
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:       68.1714%         2713667        68.3977%          399960043
semiperfect site:        74.7275%         2974642        74.9602%          438334497

Match Rate:                   NA               NA        98.2992%          480890022
Error Rate:              10.0748%          340829         0.2197%            1074954
Sub Rate:                 9.5862%          324301         0.1463%             715579
Del Rate:                 0.4477%           15145         0.0622%             304340
Ins Rate:                 0.4401%           14890         0.0112%              55035
N Rate:                   8.4204%          284863         1.4811%            7245629

It also produces a file via the "mhist" flag that you can use to visualize the distribution of the different variant types across the read length, like this:

BBMap mhist

I mainly use that when evaluating new Illumina sequencing machines.

ADD COMMENT
0
Entering edit mode

Thank you, this is very good, were can I find it?

ADD REPLY
0
Entering edit mode

BBTools is here

...but BBMap is designed for short reads. However, you can still generate the histogram using Reformat or reads mapped by a long-read aligner:

reformat.sh in=mapped.sam mhist=mhist.txt
ADD REPLY
0
Entering edit mode

Thank you, how can I set the additional flags in callvariants.sh? I tried, for instance, with --clearfilters, clearfilters, clearfilters=1 but I always get an error...

ADD REPLY
0
Entering edit mode

Simply "clearfilters" should work. Can you post the error message?

ADD REPLY

Login before adding your answer.

Traffic: 2095 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6