Question

All combinations of two columns R

0

Entering edit mode

18 months ago

Fernando • 0

Hello, I have a dataframe with two columns in R, where each row represents a different genomic position. I am trying to obtain all the combinations of columns 1 and 2 that match with any of the reference alleles I have. Can someone help?

R • 2.1k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 18 months ago by Fernando • 0

0

Entering edit mode

Where are you stuck ? You need to give a reproducible example if you want a precise answer

ADD REPLY • link 18 months ago by Basti ★ 2.0k

0

Entering edit mode

For being honest, I do not know where to start with this. This is a valid example:

This would be the test df with unordered columns:

df1 <- data.frame( Column1 = c("A", "B", "G", "D"), Column2 = c("E", "F", "C", "H")

And this would be the second df with de reference positions:

df2 <- data.frame( Ref2 = c("A", "B", "C","D"), Ref1 = c("E", "F", "G", "H") )`

The idea is to be able to generate all the combinations to achieve that both columns of df1 match those of df2 and that if it does not achieve 100% homology, it chooses the option in which there is greater homology. Obviously this is a simple example, the idea is to apply it to order mutations and determine alleles against a reference.

ADD REPLY • link 18 months ago by Fernando • 0

0

Entering edit mode

What would be your expected output in the example you gave ?

ADD REPLY • link 18 months ago by Basti ★ 2.0k

0

Entering edit mode

The expected output would be to obtein a list of dataframes of all possible combinations of df1 column 1 and column 2 and at least one combination should look exactly as df2.

ADD REPLY • link 18 months ago by Fernando • 0

score 0 · Answer 1 · 2023-06-02

Like others, I am lost what you wish to achieve. If you are really talking about regular genomic positions and intervals / overlaps etc., have a look at GRanges and the respective set operations.

If you really just have strings (gene names or the like), the expand.grid() function might be useful. It should at least solve the "The idea is to be able to generate all the combinations" part of your problem.

expand.grid(data.frame( Column1 = c("A", "B", "G", "D"), Column2 = c("E", "F", "C", "H")))