Parse allele database
1
0
Entering edit mode
5.6 years ago
oghzzang ▴ 50

Dear Biostars users.

I have this variant format.

ex)

CHROM POS   REF   ALT
1             150    CAC  CAAC

Can I this format change following format using python?

CHROM POS   REF   ALT
1             150      C        CA
1             151      A        AA
python • 1.1k views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Thanks. From now, I'll use this button. :)

ADD REPLY
2
Entering edit mode
5.6 years ago
Ram 44k

What you're asking for is called "left aligning normalization". It represents variants in the most parsimonious notation and is one of the best practices I've encountered and continue to use all the time.

If you have the VCF file this data comes from and the reference sequence used in the analysis, you can use either bcftools norm (bcftools) or vt decompose | vt norm (vt) to get to where you need from the VCF file. I'd recommend the latter as it makes tracking changes easier by adding OLD_MULTIALLELIC and OLD_VARIANT INFO fields.

If not, it becomes a much more challenging task because you're going to need to compare the REFERENCE sequence and ALT alleles manually to get to your solution.

ADD COMMENT
0
Entering edit mode

Thank you for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6