How can I detect low frequent polymorphisms ?
0
0
Entering edit mode
5.4 years ago
apl00028 ▴ 90

Hi everybody, I am trying to detect polymorphisms (and their proportions) to sanger sequences of the genes that codify for Watermelon Mosaic Virus Coat protein:

I used:

        To make an aligment:
         bowtie2-build REFERENCIA.fas REFERENCIA_index

         To index my reference:
         bowtie2 -f -x REFERENCIA_index PROBLEMA.fas -S PROBLEMA.sam

         To change sam to bam format:
         samtools view -bS PROBLEMA.sam > PROBLEMA.bam

         To call the variants:
         bcftools mpileup --redo-BAQ --min-BQ 30 --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f REFERENCIA.fas PROBLEMA.bam | bcftools call --multiallelic-caller --variants-only --pval-threshold 1.0 -Ov > variants_problema.vcf

But there are some polymorphisms that appear only once and I can not detect using this function. What should do I change for this function?

Thanks in advantage.

snp • 1.2k views
ADD COMMENT
0
Entering edit mode

Why is your file called PROBLEMA? - is it a problem? (hablo español y portugués). Which sequencer did you use to sequence your samples, and what is the anticipated read length? If you view your BAM file in a viewer, for example IGV, can you see the variant / polymorphism in the read pileup?

ADD REPLY
0
Entering edit mode

I called it Problem because it was a proof and I did that with an small sample size. With the aim to avoid bias in polymorphisms detection I did next steps: I built a phylogenetic tree of my samples using a reference sequence of each group of this virus (2 classic strains and 3 emergent strains) and I saw that most of them group so close to 1 classic strain. So I took one of my sequences most close of classic reference sequence. My read length is 843 bp. Yes, I checked that by IGV and my polymorphisms are there.

ADD REPLY
0
Entering edit mode

I see. For reads of that expected length, I am not sure that bowtie2 is the best choice. Bowtie2 is, by default, tailoured for short reads (~50bp). Although it will attempt to align anything that you provide it, there may be better choices, like BWA SW or BWA MEM.

ADD REPLY
0
Entering edit mode

Well, I think I got the solution,

If I used the tool bcftools view to see how is the distribution of polymorphims along the reference genome I can obtain a report where every SNP is recorded.

                        bcftools view variants_Amaranthus_sp.bcf > variants_Amaranthus_sp.vcf

After that I will select my polymorphisms using Rstudio

ADD REPLY
1
Entering edit mode

Okay, that command fits nowhere in your previously posted code (you never mentioned a BCF file), but, nevertheless, I am glad that you found your own solution. Based on your original language, though, it was assumed that you already had a a valid VCF but that it did not include SNPs that you were expecting.

Saludos.

ADD REPLY
1
Entering edit mode

It has the SNPs that I expected, but if you give me an alternative way to get them whitout use R, I will be heard new ideas.

¡Gracias!

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6