VCF SNP analysis
0
0
Entering edit mode
6.5 years ago
marion.ryan ▴ 50

Hello

Any advice on the best (easiest) way to handle VCF files (to find SNPs) i.e. using R or VCFtools within Linux.

Sorry if this is basic but just need to get started.

Regards Marion

SNP R VCFtools • 2.1k views
ADD COMMENT
2
Entering edit mode

define: way to handle

ADD REPLY
0
Entering edit mode

To clarify (in case OP was not aware), VCF files already contain the SNPs.

ADD REPLY
0
Entering edit mode

Just to chime in. Please read the structure of VCF and how VCFTools work and what they can do. Then read about SNP and what format of file represent them. Once you read them you can answer your own question. Man page of VCFTool is pretty descriptive. Read and then formulate your query where you get stuck we will be happy to help and yes VCF already contain SNPs (check for the column with #rsID's they are SNPs) . Well also understand difference between SNPs and SNVs. ;)

Good luck!

ADD REPLY
0
Entering edit mode

Thanks for the quick answers, I am looking for the best tool with which to navigate and explore the VCF files derived from an RNAseq experiment in order to obtain specific SNPs relating to particular genes and also compare the samples in relation to specific SNPs, so any tips in relation to the best tools would be great, but I will read up myself also. sorry I should have been a bit clearer. Regards Marion

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

so if you are trying to find mutations or variants from RNASeq then samtools workflow or STAR/GATK workflow should be fine. I personally like STAR/GATK owing to the statistical model and robustness that you can add to it. Having said that, once you have VCF you can always plot stats to see how many of your variants have PASS flag and what are the DP,AF scores of them. Then again are you looking at somatic or germline? Once you have done following with the GATK workflows you should have significant calls whereby you will have variants , some of which will be SNPs meaning they have been identified as SNPs with #rsID , rest should be novel.

If you have the id of the SNPs with you probably a list of vcf with the rsID then you can always overlap them with your VCF file and pull out the scores to make some summarisation calls. Whatever you do, you will need to annotate the variants and associate the consequences in order to find some biological relevance.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6