Entering edit mode
2.0 years ago
Nai
▴
50
I separated the list of snps from VEP text file. I would like to extract those snps from VEP annotated file (File format:#Unloaded Variant, LOCATION, ALLELE) and create output file as vcf output (CHRM, POS, REF, ALT, FILTER with genotypes) to use for allele frequency count
I generated with --vcf then, I am not able to put filtering criteria like replace . with NA , select only NA frequency column. CADD filtering. Can you guide me how to filter these INFO columns in vcf file and create new vcf files.
X.Uploaded_variation chr1_728076_-/TTC rs148146441 rs148146441 chr1_9122138_T/- rs139855605
You can parse the CSQ field with split-vep command and write your filters as bcftools expressions. To filter dots you can use
MAX_AF="."
to get low freqs and novels you can combine these expressions with logic operatorsMAX_AF <= 0.001 || MAX_AF="."
Here is an example command that would get you variants with deleterious CADD scores that are novel or low frequency: We use -c option to indicate the type of the column in this case MAX_AF is float and CADD_Pred is and Int otherwise bcftools gonna turned them all into strings. -f option is formating -d explodes the transcripts and -A sets delimiter between the CSQ columns.
You can read more about it here:
https://samtools.github.io/bcftools/howtos/plugin.split-vep.html
and here:
https://samtools.github.io/bcftools/bcftools.html#expressions
Thank you.