Question

viral genome concensus calling using samtools

0

Entering edit mode

9.4 years ago

skbrimer ▴ 740

Hello All,

I'm losing my mind and I'm sure its because I'm misunderstanding something or not doing something correctly. The project background is I'm trying to get a consensus of a variant virus. Its a small coronavirus only 27.6kbp. and we generated this data with the ion torrent.

The steps I have done so far.

1. Aligned to host and removed only reads that didn't map.

2. Use fastx toolbox to select only the high quality reads, avg Q33 for all the read.

3. Used TMAP mapall to align to the RefSeq Genome from genbank, with the lowest level of stringency. This gave me a complete genome map.

4. Using samtools mpileup -uf ref aln.sorted.bam | bcftools call -mv -Oz -o calls.vcf.gz to make vcf file

5. tabix to index

6. bcftools consensus call.vcf.gz > viral_cns.fa

then to double check my work I use the generated sequence as the new template, with stringency set to the default. The idea being that I would get a few SNPs but not many if it was done correctly.

The output I get from this is the same as the low stringency output when I look at in IGV. LOADS of SNPs.

My question is, why is this happening? What I am I not understanding about this process? Most of these variants are within 5% of each other over the whole genome so this should be that hard.

Thank you in advance.

Sean

alignment SNP sequencing • 2.8k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by skbrimer ▴ 740

1

Entering edit mode

Align the consensus sequence against the original sequence and investigate that - bwa mem or lastz can do it.

ADD REPLY • link 9.4 years ago by Istvan Albert 102k

0

Entering edit mode

That does show the snps being where I they were called. Thank you :)

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by skbrimer ▴ 740

score 1 · Accepted Answer · 2015-06-29

From aligning the consensus to the reference I have found that the pileup is working. I have also found I get better results with a freebayes since my genome is haploid. What appear to be SNPs when I use the consensus as the template turn out to be rough spots in the alignment. I think this is due to the smaller size of the fragments and the enzymatic digestion method employed by our lab. Thank you Istvan your suggestion proved helpful and I feel much better about the drafts I'm compiling.