Entering edit mode
5.4 years ago
apl00028
▴
90
Hi everybody, I am trying to detect polymorphisms (and their proportions) to sanger sequences of the genes that codify for Watermelon Mosaic Virus Coat protein:
I used:
To make an aligment:
bowtie2-build REFERENCIA.fas REFERENCIA_index
To index my reference:
bowtie2 -f -x REFERENCIA_index PROBLEMA.fas -S PROBLEMA.sam
To change sam to bam format:
samtools view -bS PROBLEMA.sam > PROBLEMA.bam
To call the variants:
bcftools mpileup --redo-BAQ --min-BQ 30 --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f REFERENCIA.fas PROBLEMA.bam | bcftools call --multiallelic-caller --variants-only --pval-threshold 1.0 -Ov > variants_problema.vcf
But there are some polymorphisms that appear only once and I can not detect using this function. What should do I change for this function?
Thanks in advantage.
Why is your file called PROBLEMA? - is it a problem? (hablo español y portugués). Which sequencer did you use to sequence your samples, and what is the anticipated read length? If you view your BAM file in a viewer, for example IGV, can you see the variant / polymorphism in the read pileup?
I called it Problem because it was a proof and I did that with an small sample size. With the aim to avoid bias in polymorphisms detection I did next steps: I built a phylogenetic tree of my samples using a reference sequence of each group of this virus (2 classic strains and 3 emergent strains) and I saw that most of them group so close to 1 classic strain. So I took one of my sequences most close of classic reference sequence. My read length is 843 bp. Yes, I checked that by IGV and my polymorphisms are there.
I see. For reads of that expected length, I am not sure that bowtie2 is the best choice. Bowtie2 is, by default, tailoured for short reads (~50bp). Although it will attempt to align anything that you provide it, there may be better choices, like BWA SW or BWA MEM.
Well, I think I got the solution,
If I used the tool bcftools view to see how is the distribution of polymorphims along the reference genome I can obtain a report where every SNP is recorded.
After that I will select my polymorphisms using Rstudio
Okay, that command fits nowhere in your previously posted code (you never mentioned a BCF file), but, nevertheless, I am glad that you found your own solution. Based on your original language, though, it was assumed that you already had a a valid VCF but that it did not include SNPs that you were expecting.
Saludos.
It has the SNPs that I expected, but if you give me an alternative way to get them whitout use R, I will be heard new ideas.
¡Gracias!