Question

SNP and In/Del Counting without Raw Sequence Data

0

Entering edit mode

10.9 years ago

gabriel.jabud ▴ 40

Is there an easy way to do SNP calling on multiple sequences of a gene (from different members in a population) without the raw sequence data? Most of the tutorials online involve starting from the raw NGS data, generating an alignment using BWA, and SNP calling using samtools/GATK. However, say I want to assume the sequences are already assembled and treat them as (mostly) correct. I think treating these sequences as raw reads and using the software I mentioned would not give me what I want since SNPs that are present in only a couple of members might be considered "errors".

Specifically, I have about 85 different sequences and I want to get a count of SNPs/InDels compared to a reference sequence. These are in-house sequences and I don't have access to the raw reads used to generate them. I've already generated a multiple sequence alignment using Muscle but don't know where to go from there. Other than writing my own custom script to read through a multiple sequence alignment, are there any other tools for the job?

sequence gene indel snp alignment • 2.2k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by gabriel.jabud ▴ 40