VCF filtration problem No genotype information
0
0
Entering edit mode
8.4 years ago
morovatunc ▴ 560

Hi,

I will try to wrap everything up in a single post so my apologies for being not accurate in the title. First of all, i would like to mention that I have read following previously asked questions. So I am not asking without any background checking.

How to interpret and extract from a Vcf file Genotype informations as values Vcf Format And Filtering

My aim is to filter my vcf files from known dbsnp and homozygous mutations to eliminate false positives. My data comes from ICGC so I believe its quality is pretty good. I will filtering process fro same patients vcf from multiple centers, then i will merge them.

  1. While I was working on pindel indel vcf, the genotype information was stated as "./." . In the first link that I have referred, It was mentioned that this "./." is related with no genotype information. When I checked all mutations rows, GT was "./.". In order to filter out homozygous mutations, what should I do ? Shouldn't there be a information such as "1/1" like I had in all of my SNV vcfs.

  2. For filtration in SNV vcf, I used snpEff. Will snpEff change the found mutations in a default run ? But it seems like it did not change anything. Should I use snpsift ? Could you also tell me is there any difference between vcftools and snpsift. (they seem to serve for the same purpose)

  3. I may need some help about understanding some terminology in the vcf header.

For example;

In my indel vcf, one of the filter is below. Could you tell me the difference in the given two mutations. What does it mean if F012 is under FILTER column. Will that mean mutationA has that specific filter property or does not have. Could you illuminate me in this terminology?

##FILTER=<ID=F012,Description="Germline: When length < 11 and depth > 9, fail if the variant is seen in both 20% of normal reads AND 20% of tumour reads in either pindel or bwa"
A
1       13656   ef9c648a-206e-11e5-a9d9-adff273a0828    CAG     C       78      F016;F010;F015  PC=D;RS=13656;RE=13659;LEN=2;S1=12;S2=270.339;REP=1;F017        GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:TG:VG  ./.:0:1:3:2:28:19:28:19:3:2:3:2 ./.:2:2:9:4:51:48:51:48:9:4:6:5
B
1       15903   5ba2b202-2078-11e5-a9d9-adff273a0828    G       GC      207     F012;F010;F015  PC=I;RS=15903;RE=15906;LEN=1;S1=32;S2=838.836;REP=1;F017        GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:TG:VG  ./.:2:0:4:0:12:2:12:2:4:0:3:2   ./.:13:1:22:1:39:19:39:19:22:1:6:7

Thank you for your help,

Best regards,

Tunc.

Vcftools snpEff • 2.8k views
ADD COMMENT
0
Entering edit mode

Hi! I am also interested in this subject and looking for an answer. Thank you.

ADD REPLY
0
Entering edit mode

This should be the comment and not the answer.

ADD REPLY
0
Entering edit mode

Moved, but I don't think this cannot be an answer. It does not semantically belong as an answer, but it is essentially the only way to bump it to the front page.

ADD REPLY
0
Entering edit mode
  1. If you are confused with "./." then is simply means the SNP region has no coverage to call the genotype.
  2. VCF tags are well described here http://www.1000genomes.org/wiki/Analysis/vcf4.0
ADD REPLY
0
Entering edit mode

Thank you for the answer. I actually have checked this documentation before but it seems interesting for me happen no coverage in ICGC data. I was asking for other possibilities.

ADD REPLY

Login before adding your answer.

Traffic: 768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6