Coming with a result with many VCF files?
1
1
Entering edit mode
8.6 years ago
morovatunc ▴ 560

Hi,

I would like to identify ICGC WGS data for my academic project. Our aim is to check specific locations across WGS variant calls and (hopefully proove) in these specific mutations the chance of INDEL is higher.

As we all know that there are currently available VCF to work with. However, I dont know which information should I get. Could you guide me a little bit? ( I know about vcftool)

For example, I am aiming to produce a data frame that will contain information about each patient (as columns) and locations in the rows. But, vcf files have different locations as the way it is.

Could you point me out methods or maybe papers to ways to evalutate that big of information? Any idea? I am feeling overwhelmed with the idea of having too much information and have 0 results.

Thank you very much for your help,

Best,

T.

PS: I apologise for speaking too general because my PI insisted on keeping the stuff secret. SAD :(

VCF variant-calling mutation-calling • 1.6k views
ADD COMMENT
0
Entering edit mode
8.5 years ago
Denise CS ★ 5.2k

If you've got VCF files, one of the things you could do is to annotate the variants in the VCF. Find out where the map on the human genome and what is their effect on genes, transcripts, proteins. Check whether they map to regulatory regions (i.e. where transcription factors map to) and if they are known to have any clinical significance (from ClinVar), somatic status, association with phenotype/disease, etc. All of this can be checked with the VEP, Variant Effect Predictor. Check if your VCF are based on the current assembly of the human genome (i.e. GRCh38) or previous one (GRCh37=hg19). The VEP is available for both. For the older assembly, start from the archive page in Ensembl, GRCh37 Ensembl.

ADD COMMENT

Login before adding your answer.

Traffic: 2194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6