Indels statistics
0
0
Entering edit mode
3.3 years ago
anna ▴ 20

Hi,

I have a vcf statistics for heterozygote and homozygote cases and I would like to find matches with my maf file. The issue is that the reference field in maf file is different and it exlcudes nucleotides in alternative states, e.g. if you have a ref CAA and alternative variant is CAAAAA, in maf file your ref would be AAA.

So I need a code to change the ref field and alt in my file with statistics (may be add separate columns ref2 and alt2)

Here is a snippet of my file:

CHR POS ID REF ALT chr11 71579744 rs71049992 A ACAGCAGCTGGACTGGGAGCAGCAGGACCTG (insertion case)

chr11 124880551 rs71859853 CCGGAGT C (deletion case)

I think I should first count numbers of nucleotides in column4 and 5. then if number in column 4 is greater than 5 (meaning deletion), then in my ref2 that position will start from the next nucleotide different from alternative one.

For insertion, I will have an alt site changed and skipped ref nucleotides

As a result, I would like to have this:

CHR POS ID REF ALT REF2 ALT2
chr11 71579744 rs71049992 A ACAGCAGCTGGACTGGGAGCAGCAGGACCTG A CAGCAGCTGGACTGGGAGCAGCAGGACCTG

chr11 124880551 rs71859853 CCGGAGT C CGGAGT C

Thank you very much in advance!

statistics vcf indel variants • 599 views
ADD COMMENT

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6