I've been using samtools/GATK to call for SNP/indels these days, and would like to filter my data against known common variants, to achieve rare events.
Some of the samtools results can be:
#CHROM POS ID REF ALT
1 10177 . ACCT ACCCT
There could be several alternatives for both REF and ALT sequence. So when I compare my SNP/indel with common SNP database; should I just compare the position using bedtools, or should I look into the change of bases? For example, if in database, on position 1, it's a SNP changing from G to C; while my results indicate also at position 1,but changing from G to T. Then should I regard this as common SNP or rare SNP?
Thanks!
so basically you mean we should consider the change of base, rather than simply the position, right? thx
Yes, and also in downstream analysis, what kind of mutation is- missense, nonsense , frameshift etc.
Also, the dbSNP, you mean all SNP, common SNP? There are two different database on UCSC, one "all SNP", one "common SNP"
See my edit. Select accordingly.