separate_rows, but keep related ones together
3
1
Entering edit mode
3.9 years ago
Ram 44k

Hi,

This appears to be a simple problem that I am unable to solve. I have some data that looks like this:

CHROM    POS    REF    ALT    TYPE       AF
chr1     1      A      T      MISSENSE   0.23
chr2     1      A      T,G    MISSENSE   0.17, 0.09

The above is dummy meaningless data, but it is representative of the problem at hand.

I'd like to separate_rows such that the ALT and AF are separated in a couples manner. Running separate_rows on the 2 columns would give me 4 rows, not 2. I'd like my output to be:

CHROM    POS    REF    ALT    TYPE       AF
chr1     1      A      T      MISSENSE   0.23
chr2     1      A      T      MISSENSE   0.17
chr2     1      A      G      MISSENSE   0.09

Is there any way I can conserve this combination while separating the values out? I am really far out from the VCF to go back and split multi-allelics.

r variants multiallelic • 1.4k views
ADD COMMENT
4
Entering edit mode
3.9 years ago

If I run this:

separate_rows(data, ALT, AF, convert = TRUE)

where data is your 1st data frame I obtain the second data frame.

How did you run the function separate_rows()?

ADD COMMENT
1
Entering edit mode

I run with defaults buy I'll try toggling the convert parameter. Thanks, Antonio!

ADD REPLY
0
Entering edit mode

You're welcome. Actually I was lucky, because first I tested on the example from the function documentation, that sets convert = TRUE. Since the outcome was similar to what you wanted, I just kept it.

ADD REPLY
0
Entering edit mode

OK, moment of truth - I did not run separate_rows, I assumed how it would work based on my experience. It looks like separate_rows does exactly what I need, not a random combination like I thought it would. I really should have tested it before asking here. Sorry about that.

ADD REPLY
2
Entering edit mode
3.9 years ago

You could use a Python script to do this easily:

#!/usr/bin/env python

import sys

headers = None
idx = 0
for line in sys.stdin:
    elems = line.rstrip().split('\t')
    if idx == 0:
        headers = elems
        sys.stdout.write(line)
    else:
        items = {x:y for x,y in zip(headers, elems)}
        alleles = items['ALT'].split(',')
        afs = items['AF'].split(',')
        for ai in range(len(alleles)):
            items['ALT'] = alleles[ai]
            items['AF'] = afs[ai]
            sys.stdout.write('{}\n'.format('\t'.join([items[x] for x in headers])))
    idx += 1

For example:

$ ./split.py < variants.txt
CHROM   POS REF ALT TYPE    AF
chr1    1   A   T   MISSENSE    0.23
chr2    1   A   T   MISSENSE    0.17
chr2    1   A   G   MISSENSE     0.09

Write it out to a file and bring that back into R:

$ ./split.py < variants.txt > variants.split.txt
ADD COMMENT
0
Entering edit mode

Turns out, I'm an idiot who should really test something and make sure it doesn't work before saying it doesn't work. separate_rows works exactly the way I need my solution to, not the way I thought it would.

ADD REPLY
2
Entering edit mode
3.9 years ago
zx8754 12k

Using data.table:

library(data.table)

x <- fread("CHROM POS REF ALT TYPE AF
chr1 1 A T MISSENSE 0.23
chr2 1 A T,G MISSENSE 0.17,0.09")

x[, lapply(.SD, function(x) unlist(tstrsplit(x, ",", fixed = TRUE))),
    by = .(CHROM, POS, REF, TYPE)
  ][, .(CHROM, POS, REF, ALT, TYPE, AF = as.numeric(AF))]
#    CHROM POS REF ALT     TYPE   AF
# 1:  chr1   1   A   T MISSENSE 0.23
# 2:  chr2   1   A   T MISSENSE 0.17
# 3:  chr2   1   A   G MISSENSE 0.09

Below should work with auto type conversion, but it fails, as the first value "T" gets converted as logical "TRUE", next "T,G" as character, then when binding it, unfortunately, errors out:

x[, lapply(.SD, function(x) unlist(tstrsplit(x, ",", fixed = TRUE, type.convert = TRUE))),
    by = .(CHROM, POS, REF, TYPE)]
# Error in `[.data.table`(x, , lapply(.SD, function(x) unlist(tstrsplit(x,  : 
#   Column 1 of result for group 2 is type 'character' but expecting type
#   'logical'. Column types must be consistent for each group.

Related SO post with other alternative solutions:

ADD COMMENT
0
Entering edit mode

Gotta love SO's benchmarked solution list :-)

ADD REPLY

Login before adding your answer.

Traffic: 2141 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6