Question

How can I detect low frequent polymorphisms ?

0

Entering edit mode

5.8 years ago

apl00028 ▴ 90

Hi everybody, I am trying to detect polymorphisms (and their proportions) to sanger sequences of the genes that codify for Watermelon Mosaic Virus Coat protein:

I used:

        To make an aligment:
         bowtie2-build REFERENCIA.fas REFERENCIA_index

         To index my reference:
         bowtie2 -f -x REFERENCIA_index PROBLEMA.fas -S PROBLEMA.sam

         To change sam to bam format:
         samtools view -bS PROBLEMA.sam > PROBLEMA.bam

         To call the variants:
         bcftools mpileup --redo-BAQ --min-BQ 30 --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f REFERENCIA.fas PROBLEMA.bam | bcftools call --multiallelic-caller --variants-only --pval-threshold 1.0 -Ov > variants_problema.vcf

But there are some polymorphisms that appear only once and I can not detect using this function. What should do I change for this function?

Thanks in advantage.

snp • 1.4k views

ADD COMMENT • link 5.8 years ago by apl00028 ▴ 90

0

Entering edit mode

Why is your file called PROBLEMA? - is it a problem? (hablo español y portugués). Which sequencer did you use to sequence your samples, and what is the anticipated read length? If you view your BAM file in a viewer, for example IGV, can you see the variant / polymorphism in the read pileup?

ADD REPLY • link 5.8 years ago by Kevin Blighe 89k

0

Entering edit mode

I called it Problem because it was a proof and I did that with an small sample size. With the aim to avoid bias in polymorphisms detection I did next steps: I built a phylogenetic tree of my samples using a reference sequence of each group of this virus (2 classic strains and 3 emergent strains) and I saw that most of them group so close to 1 classic strain. So I took one of my sequences most close of classic reference sequence. My read length is 843 bp. Yes, I checked that by IGV and my polymorphisms are there.

ADD REPLY • link 5.8 years ago by apl00028 ▴ 90

0

Entering edit mode

I see. For reads of that expected length, I am not sure that bowtie2 is the best choice. Bowtie2 is, by default, tailoured for short reads (~50bp). Although it will attempt to align anything that you provide it, there may be better choices, like BWA SW or BWA MEM.

ADD REPLY • link 5.8 years ago by Kevin Blighe 89k

0

Entering edit mode

Well, I think I got the solution,

If I used the tool bcftools view to see how is the distribution of polymorphims along the reference genome I can obtain a report where every SNP is recorded.

                        bcftools view variants_Amaranthus_sp.bcf > variants_Amaranthus_sp.vcf

After that I will select my polymorphisms using Rstudio

ADD REPLY • link 5.8 years ago by apl00028 ▴ 90

1

Entering edit mode

Okay, that command fits nowhere in your previously posted code (you never mentioned a BCF file), but, nevertheless, I am glad that you found your own solution. Based on your original language, though, it was assumed that you already had a a valid VCF but that it did not include SNPs that you were expecting.

Saludos.

ADD REPLY • link 5.8 years ago by Kevin Blighe 89k

1

Entering edit mode

It has the SNPs that I expected, but if you give me an alternative way to get them whitout use R, I will be heard new ideas.

¡Gracias!

ADD REPLY • link 5.8 years ago by apl00028 ▴ 90