Multiple reference alleles in vcf file
1
0
Entering edit mode
5.4 years ago

I am interested to petform splicing QTL analysis (sQTL). In my vcf files at some reference positions, I have more than one allele, should I need to keep them or remove rows containing those snps? For example position 187 and position 194 contains more than one allele so should I need to remove these rows?

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  108 139
1   73  .   C   A   .   PASS    .   GT  0   0
1   83  .   T   C,A .   PASS    .   GT  1   1
1   187 .   TG  T   .   PASS    .   GT  1   1
1   188 .   G   T   .   PASS    .   GT  0   0
1   189 .   T   C,G .   PASS    .   GT  0   0
1   190 .   G   A   .   PASS    .   GT  0   0
1   194 .   ATT A   .   PASS    .   GT  1   1
1   209 .   C   T   .   PASS    .   GT  0   0
SNP • 3.2k views
ADD COMMENT
2
Entering edit mode

I don't see anywhere that you have multiple REF alleles. There are multi-allelic sites (with multiple ALT alleles), sure, but no multiple REF alleles. Maybe you're looking at the wrong column header? Here's your data formatted for eyeballing:

#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  108  139
1       73   .   C    A    .     PASS    .     GT      0    0
1       83   .   T    C,A  .     PASS    .     GT      1    1
1       187  .   TG   T    .     PASS    .     GT      1    1
1       188  .   G    T    .     PASS    .     GT      0    0
1       189  .   T    C,G  .     PASS    .     GT      0    0
1       190  .   G    A    .     PASS    .     GT      0    0
1       194  .   ATT  A    .     PASS    .     GT      1    1
1       209  .   C    T    .     PASS    .     GT      0    0
ADD REPLY
0
Entering edit mode

Indeed this is a new thing for me, I have again checked the original file and it contains multiple reference alleles, I have downloaded the vcf file from here.

ADD REPLY
0
Entering edit mode

Can you please paste a few sample lines? Use this line of code to get the sample records:

awk -F"\t" -v OFS="\t" -v cntr=0 '$4 ~ /,/ { cntr=cntr+1; print; } cntr==10{ exit; }' | column -ts $'\t' vcf_file.vcf
ADD REPLY
1
Entering edit mode
5.4 years ago
Fabio Marroni ★ 3.0k

Positions 187 and 194 are deletions. So in your reference you have TG and in the alternative allele you have T (deletion of a G) . Same is true for position 194, were ATT is the reference allele and the alternative allele is A (meaning that TT are deleted).

ADD COMMENT
0
Entering edit mode

Neither of those positions fit the description of "multiple alleles". They are both single multi-base alleles, the standard way of denoting a deletion.

ADD REPLY
0
Entering edit mode

But that notation might explain OPs confusion.

ADD REPLY
0
Entering edit mode

That makes sense. Just noticed OP referring to these positions specifically, so they should probably read the VCF specification and make sure they understand regular representation versus multi-allelics..

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6