Entering edit mode
3.8 years ago
C4
▴
30
Hi, I have a vcf file from single cell DNA dataset and would like to extract GT, PL tag as well as allelic frequency for each barcode/sample, in this order:
Sample_name, GT, PL, AF
I tried bcftools query -f ' %CHROM %POS[\t%GT\t%PL]\n', but it is not giving me per sample information. Any help would be appreciated. Thanks!
works here. Do you have those fields defined, do you have any genotype ?
Thanks for your response.
It does give me a warning : Contig 'chrX' is not defined in the header. Should bgzip the vcf file and tabix index it?
Although it runs, and gives an output like this:
I wanted an output corresponding each sample_name.
Ok, I tabix indexed it. I still get an output -
Don't see sample info anywhere.
it's here ! " . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 0/0 0,3,50 . . . . . . . . . . . . . 0/0 0,3,42 . . . . . . . . . . . . . 1/1 42,3,0 . . . . . . . . . . . . . ."
I think I have it reversed, where columns have sample name, hence all dots. How could I make this more user-readable i.e convert into a tsv file? Thanks a; lot!!
your output is already TSV.
Yes but when I export it to R with read.csv(file, sep="\t), it isn't actually is in the format I need with sample, tags. This is my output from command with PL tag and samples in column, how could I export it in R in a csv or table format for further analysis? Thank you for your help!! [1]AAACAACGACAGTCTA:PL[2]AAACAACGATGATGAA:PL[3]AAACAACGATTCGCCT:PL[4]AAACATGGACCGTTAA:PL[5]AAACATGGACGTTAGT:PL[6> ...................0,3,42.........................................................................................> ..................................................................................................................> ..................................................................................................................> ............................................................0,3,42................................................> ............................................................0,3,42................................................> ............................................................42,3,0................................................> ..0,3,30.0,3,50...................................................................................................> ..0,3,42....0,3,50.............................................................................................
Also, do dots mean that there is missing data for those samples?
I actually figured it out, If anyone looking to do something similar. The R library(vcfR) can make vcf files quite user-readable!