Which of my mutations are SNPs? Is there an R package to annotate them?
1
0
Entering edit mode
7.2 years ago
spromanos • 0

I have a list of mutations and I want to check if they are SNPs or not. I cannot use a normal control for germline mutations, but I can filter out mutations that have been reported as SNPs. Is there a way I can do that through a package in R? Perhaps, Biomart?

SNP R • 1.7k views
ADD COMMENT
0
Entering edit mode

So what type of files do you have? Or could you give a sample format of your data?

ADD REPLY
0
Entering edit mode

It's a maf file. Gene name, chromosome, position, protein change, reference allele, alternative allele, reads, tumor fraction etc. I want to merge it with a table that has mutations and a SNP or not column. But where do I get this table from? I am lucky enough this time and there are only a few mutations, so I can check them manually, but there should be a way to do this in R.

ADD REPLY
0
Entering edit mode

Some questions, so that people can help you better:

1) Isn't the type of variant specified in field 10 (Variant_Type) of a MAF file?

2) Can you post the first lines of the file?

ADD REPLY
0
Entering edit mode

Unfortunately, I can't post, however it is a pretty straightforward issue. I think using SNP to denote both a germline variant and a single nucleotide somatic mutation is to blame for the confusion. I am gonna try explain it better here.

I have a table with the following columns: gene name, chromosome, position of the mutation, protein change, reference allele, alternative allele, reads supporting alternative allele, reads supporting reference allele, variant allele fraction etc. Each row has a specific single nucleotide mutation.

Since I don't have data for the normal sample, I can't tell which of those mutations are germline variants and which are somatic. What I want to do is check every mutation on Ensembl or another database to see if it has been reported as a SNP, as a "normal" variant and exclude those from further analyses.

ADD REPLY
0
Entering edit mode

Any single point mutation is technically a SNP. Well I guess that's not entirely true for a single point indel. But still, any mutation causing a change in a single base to a reference is a SNP. Larger mutations are typically referred to as "structural variants". You can look at something like dbSNP to see if a SNP has been previously described or reported in literature. ExAC is useful to assess frequencies of SNPs especially less common variants. Maybe ClinVar to assess if it's associated with any phenotypes. But no database alone will tell you whether a SNP is somatic or germline. You'll need to compare a test sample versus a control sample to determine that.

ADD REPLY
0
Entering edit mode

Thanks! I'm looking to do it in an automated way, though.
As I said, there are no germline data to compare, so I have to make do with what I have. If I exclude variants that have been reported in >1% of the population, then I should end up with a list of mutations that have at least a higher probability of being somatic.

ADD REPLY
1
Entering edit mode
7.2 years ago

Use Ensembl Variation. You can query it with the perl API. Check the tutorial to get started. The biomaRt bioconductor package also gives you access to some of the variation data.

ADD COMMENT

Login before adding your answer.

Traffic: 1871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6