Tool to check what alternate allele is dominant across samples per line of the VCF file.
2
0
Entering edit mode
4.8 years ago
halo22 ▴ 300

Hello All,

I am very new to WGS analysis. I have a multisample VCF file that I have annotated using snpEFF. I wanted to see if I can find what alternate alleles are conserved between samples for each genomic location. For eg: Chr1 pos: 1001, has a reference A and the alternate allele seen are T, AAT, TTT, AA and there are 10 samples. I want to count what samples have TTT, AA and so on. This way I can understand what allele is dominant across samples for each position.

All help is appreciated

Thanks.

next-gen WGS • 1.3k views
ADD COMMENT
0
Entering edit mode
4.8 years ago
Carambakaracho ★ 3.3k

I made very good experiences with the bioconductor vcfR package.

In case you're new to R, too and this is a one-off project, there's nothing wrong to just use Excel and multiple text-to-column operations to split the data (provided your machine is powerful enough to handle it). It's a bit tedious, but the learning curve is less steep

ADD COMMENT
0
Entering edit mode

there's nothing wrong to just use Excel

enter image description here

ADD REPLY
1
Entering edit mode

:-D damn it, I got the excel shame AND didn't realise the thread was 20 days old.

@halo22 you better not use my excel advise and try to hire with Pierre I guess.

@pierre or anyone as this is most likely irrelevant for the OP anyway, does biostars feature strikeout markdown?

Cheers

ADD REPLY
0
Entering edit mode

Thank you, guys! I wrote my own to get this done.

ADD REPLY
0
Entering edit mode
4.8 years ago

With plink 2.0,

plink2 --vcf <VCF path> --freq counts

gets this information for you. (Remove 'counts' if you want proportions instead.)

ADD COMMENT

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6