Python API solution (the `pymaf.MafFrame.from_vcf` method):

Question

Generating Positional List from VCF

1

Entering edit mode

3.4 years ago

Ared445 ▴ 60

I'd like to generate a 1-based position list from VCF file for all variants. I believe that by VCF convention, the listed position in POS column specifies the same base for a single nucleotide substitution, but the preceding base for both insertions and deletions.

So, I thought that to specify the position of each variant as start - end - with a script you could take the position N provided by the VCF and convert as follows:

Insertion = N - N+1
SNP = N - N
Deletion = N+1 - N+length(REF)-1

So for the following sample:

CHROM   POS             REF     ALT
11      66091886        T       TTTC
11      66108375        T       G
11      67180763        GTATT   G

It becomes:

CHROM   START           END 
11      66091886        66091887
11      66108375        66108375
11      67180764        67180767

Just wondering if I have gone about this correctly, and this method would in fact specify where in my alignment the variant itself occurs?

guidance • 1.7k views

ADD COMMENT • link updated 13 months ago by Thind amarinder ▴ 340 • written 3.4 years ago by Ared445 ▴ 60

score 1 · Answer 1 · 2021-07-01

I think one of the ways to achieve your goal is by converting your VCF file to a MAF (Mutation Annotation Format) file. To this end, you may want to check out the fuc package I wrote:

Python API solution (the `pymaf.MafFrame.from_vcf` method):

>>> from fuc import pyvcf, pymaf
>>> data = {
...     'CHROM': ['chr1', 'chr1', 'chr1'],
...     'POS': [100, 200, 300],
...     'ID': ['.', '.', '.'],
...     'REF': ['G', 'C', 'TTC'],
...     'ALT': ['A', 'CAG', 'T'],
...     'QUAL': ['.', '.', '.'],
...     'FILTER': ['.', '.', '.'],
...     'INFO': ['.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT'],
...     'Steven': ['0/1', '0/1', '0/1']
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> vf.df
  CHROM  POS ID  REF  ALT QUAL FILTER INFO FORMAT Steven
0  chr1  100  .    G    A    .      .    .     GT    0/1
1  chr1  200  .    C  CAG    .      .    .     GT    0/1
2  chr1  300  .  TTC    T    .      .    .     GT    0/1
>>> mf = pymaf.MafFrame.from_vcf(vf)
>>> # mf = pymaf.MafFrame.from_vcf('your_file.vcf') # Above is just an example, you can directly import your VCF file
>>> mf.df
  Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome  Start_Position  End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Protein_Change Tumor_Sample_Barcode
0           .              .      .          .       chr1             100           100      .                      .          SNP                G                 A                 A              .               Steven
1           .              .      .          .       chr1             200           201      .                      .          INS                -                AG                AG              .               Steven
2           .              .      .          .       chr1             301           302      .                      .          DEL               TC                 -                 -              .               Steven
>>> # mf.to_file('your_file.maf')

CLI solution (the `maf-vcf2maf` command):

$ fuc maf-vcf2maf -h
usage: fuc maf-vcf2maf [-h] vcf

This command will convert an annotated VCF file to a MAF file.

Usage examples:
  $ fuc maf-vcf2maf in.vcf > out.maf

Positional arguments:
  vcf         VCF file.

Optional arguments:
  -h, --help  Show this help message and exit.

Python API solution (the pymaf.MafFrame.from_vcf method):

CLI solution (the maf-vcf2maf command):

Python API solution (the `pymaf.MafFrame.from_vcf` method):

CLI solution (the `maf-vcf2maf` command):