Hello, I have a dataframe with two columns in R, where each row represents a different genomic position. I am trying to obtain all the combinations of columns 1 and 2 that match with any of the reference alleles I have. Can someone help?
Hello, I have a dataframe with two columns in R, where each row represents a different genomic position. I am trying to obtain all the combinations of columns 1 and 2 that match with any of the reference alleles I have. Can someone help?
Like others, I am lost what you wish to achieve. If you are really talking about regular genomic positions and intervals / overlaps etc., have a look at GRanges and the respective set operations.
If you really just have strings (gene names or the like), the expand.grid()
function might be useful. It should at least solve the "The idea is to be able to generate all the combinations" part of your problem.
expand.grid(data.frame( Column1 = c("A", "B", "G", "D"), Column2 = c("E", "F", "C", "H")))
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Where are you stuck ? You need to give a reproducible example if you want a precise answer
For being honest, I do not know where to start with this. This is a valid example:
This would be the test df with unordered columns:
df1 <- data.frame( Column1 = c("A", "B", "G", "D"), Column2 = c("E", "F", "C", "H")
And this would be the second df with de reference positions:
df2 <- data.frame( Ref2 = c("A", "B", "C","D"), Ref1 = c("E", "F", "G", "H") )`
The idea is to be able to generate all the combinations to achieve that both columns of df1 match those of df2 and that if it does not achieve 100% homology, it chooses the option in which there is greater homology. Obviously this is a simple example, the idea is to apply it to order mutations and determine alleles against a reference.
What would be your expected output in the example you gave ?
The expected output would be to obtein a list of dataframes of all possible combinations of df1 column 1 and column 2 and at least one combination should look exactly as df2.