Entering edit mode
7.2 years ago
spromanos
•
0
I have a list of mutations and I want to check if they are SNPs or not. I cannot use a normal control for germline mutations, but I can filter out mutations that have been reported as SNPs. Is there a way I can do that through a package in R? Perhaps, Biomart?
So what type of files do you have? Or could you give a sample format of your data?
It's a maf file. Gene name, chromosome, position, protein change, reference allele, alternative allele, reads, tumor fraction etc. I want to merge it with a table that has mutations and a SNP or not column. But where do I get this table from? I am lucky enough this time and there are only a few mutations, so I can check them manually, but there should be a way to do this in R.
Some questions, so that people can help you better:
1) Isn't the type of variant specified in field 10 (Variant_Type) of a MAF file?
2) Can you post the first lines of the file?
Unfortunately, I can't post, however it is a pretty straightforward issue. I think using SNP to denote both a germline variant and a single nucleotide somatic mutation is to blame for the confusion. I am gonna try explain it better here.
I have a table with the following columns: gene name, chromosome, position of the mutation, protein change, reference allele, alternative allele, reads supporting alternative allele, reads supporting reference allele, variant allele fraction etc. Each row has a specific single nucleotide mutation.
Since I don't have data for the normal sample, I can't tell which of those mutations are germline variants and which are somatic. What I want to do is check every mutation on Ensembl or another database to see if it has been reported as a SNP, as a "normal" variant and exclude those from further analyses.
Any single point mutation is technically a SNP. Well I guess that's not entirely true for a single point indel. But still, any mutation causing a change in a single base to a reference is a SNP. Larger mutations are typically referred to as "structural variants". You can look at something like dbSNP to see if a SNP has been previously described or reported in literature. ExAC is useful to assess frequencies of SNPs especially less common variants. Maybe ClinVar to assess if it's associated with any phenotypes. But no database alone will tell you whether a SNP is somatic or germline. You'll need to compare a test sample versus a control sample to determine that.
Thanks! I'm looking to do it in an automated way, though.
As I said, there are no germline data to compare, so I have to make do with what I have. If I exclude variants that have been reported in >1% of the population, then I should end up with a list of mutations that have at least a higher probability of being somatic.