Tool for calculating variant frequency
1
0
Entering edit mode
8.3 years ago
nkabo ▴ 80

Hi,

I have list of variants (snp and indels) and I have rs_ids for each. I have list of rs_id in a csv file. Could you suggest me a web tool or program that takes rs_id file as input and can calculate the frequency of the variant and the distance between expected disease allele frequency and the calculated frequency of the variant? Thanks in advance.

SNP genome • 3.9k views
ADD COMMENT
1
Entering edit mode
8.3 years ago
DG 7.3k

Is your data from a population and can you go back to original VCF files? If so you could keep it as a population VCF and annotate the dataset with a tool like GEMINI or Annovar. These tools will typically give you the population allele frequency from your dataset as one of the annotation columns (at least I know GEMINI does), as well as all of the population databases (dbSNP, 1000 Genomes, EVS, ExAc). You can then use something like VCFTools or PyVCF or something to write a little script to go through the variants and calculate the difference in frequencies.

ADD COMMENT
0
Entering edit mode

Thank you for your response, rs_ids are from genome sequencing data of a person and I want to find the deviation of the specified variant from expected disease allele frequency, I will try those tools.

ADD REPLY
0
Entering edit mode

For germline mutations if you are sequencing a single individual you don't have an allele frequency really. I mean you have reads and proportion of reads with the mutant allele but that corresponds as to whether the variant is heterozygous or homozygous and isn't comparable to population frequencies.

ADD REPLY
0
Entering edit mode

I see, I try to analyse somatic mutations in this case, is it still incomparable with the population frequency? Thanks in advance.

ADD REPLY
0
Entering edit mode

What is the question you are trying to answer? If you are looking at a tumour in a single individual the somatic allele frequency only has any particular meaning associated with it in a few senses:

1) Setting some sort of minimum threshold for calling a somatic variant confidently. Things below this threshold have a higher probability of being false positives or sequencing artefacts

2) If the somatic variant in question can be used as a biomarker, particularly for say tumour content/load, residual disease, estimating whether the mutation was an early or late acquisition during tumour evolution, etc.

But no, somatic allele frequency is not comparable to population allele frequencies at all. In fact we typically use population databases to identify rare and common germline population frequencies in order to filter out germline mutations from our tumour sample when we aren't doing paired sequencing (tumour + germline sample like blood).

ADD REPLY
0
Entering edit mode

Thank you for your answer, I try to understand the effect of somatic variants by the data coming from blood samples. I guess I should revise my point of view in this case.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6