How to interpret and extract from a Vcf file Genotype informations as values
2
3
Entering edit mode
10.6 years ago
fusion.slope ▴ 250

Hi!

I have to perform eQTL analysis and now I have to manipuate the vcf file. I would like to know if there are some way to convert these information: 1/1=2 ; ./.=0 ; 0/1=1)

Example:

#CHROM  POS     REF     ALT     DGRP-038        DGRP-040        DGRP-045
2L            2262     T          TTC     1/1:7:0               ./.:0:0              ./.:0:0

Then in this case the following:

#CHROM  POS     REF     ALT     sample-1        sample-2        sample-3
2L            2262     T          TTC           2                   0                   0

Are there any software to do that?

The second question is: does the meaning of these values (0 1 2) are the following?

  • reference/reference (no mutation) = 1 no present in my example
  • reference/alternative (1 mutation copy) = 2 and in my case (1/1:7:0 )
  • alternative/alternative (both copy mutated) = 0 and in my case (./.)

Thanks in advance!!
Tommi

eQTL VCF_file_Manipulation Genotype • 6.6k views
ADD COMMENT
3
Entering edit mode

The correct interpretation for a diploid genome is

0/0 - 0 (Homozygous Reference)
0/1 - 1 (Heterozygous)
1/1 - 2 (Homozygous Alternate)
./. - No Data genotyped, so you'd likely skip these regions.
ADD REPLY
1
Entering edit mode

only variants can be bi-allelic. you probably meant diploid.

ADD REPLY
0
Entering edit mode

You're right, my bad on the terminology.

ADD REPLY
0
Entering edit mode

Thanks Vivek I am working with Drosophila and I think is Biallelic.. if you know some tools to convert the information please let me know.

Tommi

ADD REPLY
0
Entering edit mode

I don't understand what the genotype mean :

0/0 - (Homozygous Reference)
0/1 - (Heterozygous)
1/1 - (Homozygous Alternate)
  1. Any clear explanation ?
  2. In this exemple:
CHROM  POS     REF     ALT     DGRP-038        DGRP-040        DGRP-045
2L            2262     T          TTC     1/1:7:0               ./.:0:0              ./.:0:0

DGRP-038 has homozygous alternate, what does that mean?

ADD REPLY
0
Entering edit mode

It means both chromosomes carry the insertion T>TTC

ADD REPLY
0
Entering edit mode

HOw a mutation can be detect in the 2 chromosomes and not in 1 only ?

ADD REPLY
3
Entering edit mode
10.6 years ago

With PLINK 1.9,

plink --vcf [vcf filename] --allow-extra-chr --recode A

will write 0/1/2-coded genotypes to plink.raw .

ADD COMMENT
0
Entering edit mode

thanks chrchang523!! ...anyway is not what I want to obtain..

ADD REPLY
1
Entering edit mode

what exactly would you want to obtain then? the easiest way to do exactly what you describe would be to transform the vcf file into a table, using vcf-to-tab for instance, and then change the genotypes with some simple scripting.

ADD REPLY
0
Entering edit mode
10.6 years ago
vlaufer ▴ 290

vcf query from vcf tools might also work quite well for this purpose ...and I have to mention it just in case you have not already used it!

ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6