Vcf file extraction
0
0
Entering edit mode
3.8 years ago
C4 ▴ 30

Hi, I have a vcf file from single cell DNA dataset and would like to extract GT, PL tag as well as allelic frequency for each barcode/sample, in this order:

Sample_name, GT, PL, AF

I tried bcftools query -f ' %CHROM %POS[\t%GT\t%PL]\n', but it is not giving me per sample information. Any help would be appreciated. Thanks!

SNP next-gen • 1.1k views
ADD COMMENT
1
Entering edit mode

works here. Do you have those fields defined, do you have any genotype ?

$ bcftools query -f ' %CHROM %POS[\t%GT\t%PL]\n' ~/src/jvarkit/src/test/resources/rotavirus_rf.vcf.gz | head
 RF01 970   0/0 0,9,47  0/0 0,18,73 0/0 0,18,73 0/0 0,33,116    1/1 95,24,0
 RF02 251   0/0 0,15,57 0/1 31,0,5  0/1 31,0,5  0/0 0,9,42  0/0 0,24,69
 RF02 578   0/0 0,33,122    0/0 0,39,135    0/0 0,39,135    1/1 100,30,0    0/0 0,27,109
ADD REPLY
0
Entering edit mode

Thanks for your response.

It does give me a warning : Contig 'chrX' is not defined in the header. Should bgzip the vcf file and tabix index it?

Although it runs, and gives an output like this:

chrX    .   .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,42  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,50  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,50  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,42  .   .   .   .   .   .   0/0 0,3,30  .   .   .   .   .   0

I wanted an output corresponding each sample_name.

ADD REPLY
0
Entering edit mode

Ok, I tabix indexed it. I still get an output -

chrX 251549 .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,42  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,42  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,50  .   .   .   .   .   .   .   .   .   .   .   .   .   0/0 0,3,42  .   .   .   .   .   .   .   .   .   .   .   .   .   1/1 42,3,0  .   .   .   .   .   .   .   .   .   .   .   .   .   .

Don't see sample info anywhere.

ADD REPLY
0
Entering edit mode

Don't see sample info anywhere.

it's here ! " . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 0/0 0,3,50 . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 1/1 42,3,0 . . . . . . . . . . . . . ."

ADD REPLY
0
Entering edit mode

I think I have it reversed, where columns have sample name, hence all dots. How could I make this more user-readable i.e convert into a tsv file? Thanks a; lot!!

ADD REPLY
0
Entering edit mode

your output is already TSV.

ADD REPLY
0
Entering edit mode

Yes but when I export it to R with read.csv(file, sep="\t), it isn't actually is in the format I need with sample, tags. This is my output from command with PL tag and samples in column, how could I export it in R in a csv or table format for further analysis? Thank you for your help!! [1]AAACAACGACAGTCTA:PL[2]AAACAACGATGATGAA:PL[3]AAACAACGATTCGCCT:PL[4]AAACATGGACCGTTAA:PL[5]AAACATGGACGTTAGT:PL[6> ...................0,3,42.........................................................................................> ..................................................................................................................> ..................................................................................................................> ............................................................0,3,42................................................> ............................................................0,3,42................................................> ............................................................42,3,0................................................> ..0,3,30.0,3,50...................................................................................................> ..0,3,42....0,3,50.............................................................................................

Also, do dots mean that there is missing data for those samples?

ADD REPLY
0
Entering edit mode

I actually figured it out, If anyone looking to do something similar. The R library(vcfR) can make vcf files quite user-readable!

ADD REPLY

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6