As a beginner, I have a basic question about a bam alignment. After map my fastq reads (from a single individual) to a reference (bwa), I can see the variations, which I guess it includes sequencing errors, misalignment, errors in library preparation and real SNPs. In an haploid organism, I suppose there is only one possible correct result for each position so there is only one correct consensus. After do variant calling with samtools:
samtools mpileup -uf params bam | bcftools call -mv -Oz -o vcf
and with lofreq:
lofreq call -f ref -o outvcf mybam
I obtain two vcf with SNP's and SNV's (bigger than the SNP's file, as expected). How these programs mark a variation as a SNP or SNV if I am working with only one sample? The definition of SNP is, from wikipedia:
A variation in a single nucleotide that occurs at a specific position in the genome, where each variation is present to some appreciable degree within a population (e.g. > 1%)
Thanks!
Why do you think you have one file with "SNP" and one with "SNV"?
"SNP" stands for "SIngle Nucleotide Polymorphism". And "SNV" for "Single Nucleotide Variation".
The term SNP is more often used in talks. The problem is that "polymorphism" implice that the change in sequence is quite often and have little or no impact on the gene function. But people started to use this term for almost every change in sequence even for those which have influence.
So to avoid the irritation, whether there is an impact or not, SNV is a much better word.
fin swimmer
In which organism you work ? http://csb5.github.io/lofreq/commands/#call if you look this page it seems they use dbsnp to call SNP for human by default :
@finswimmer OK so leaving aside the definition of SNP or SNV, they just extract the variations against a reference (identifying and discarding the possible errors, misalignments, etc). I was confused with the SNP's definition. Thanks!
It is an insect but thanks anyway @Titus!
Cockroach? - Periplaneta spp.?
Just be wary of dbSNP - it is a grand mix of 'common' and 'rare' variants, many of which have clinical relevance and are even listed in ClinVar as pathogenic alleles.