Question

R programming: match and rearrange

2

Entering edit mode

9.6 years ago

MAPK ★ 2.1k

Hi guys,

I have R programming question: I have more than 1000 samples (1:1000) with both GTs and ADs for each sample (Genotype). I want to match the genotype for all the samples (Genotype) in (Names), or in other words, I want to match Gene1.GT and get both Gene1.GT and Gene1.AD and so forth from Genotype and get the (Result) as listed below. Thank you.

Names <- c("cebi", "pithe", "Gene1.GT", "sapiens" "Gene2.GT", "calli", "Gene3.GT")
Genotype <- c("Gene1.GT", "Gene1.AD", "Gene2.GT", "Gene2.AD", "Gene3.GT", "Gene3.AD")

Result:

-> "cebi", "pithe", "Gene1.GT", "Gene1.AD", "sapiens", "Gene2.GT", "Gene2.AD", "calli", "Gene3.GT", "Gene3.AD"

R • 2.3k views

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by MAPK ★ 2.1k

0

Entering edit mode

It's a little unclear how you are mapping between names and genotypes. Can you explain a bit more about how the result relates to the input?

ADD REPLY • link 9.6 years ago by Alex Reynolds 36k

0

Entering edit mode

Thank you for your reply. I want to match the part Gene1, Gene2, Gene3... and get both GTs and ADs for them. For example, I want to match "Gene1" common in both objects and get Gene1.GT and Gene1.AD from (Genotype) and get the (Result). So, I want to match Gene1:Gene1000 and get all the corresponding GTs and ADs in the same order it matches with the (Names).

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by MAPK ★ 2.1k

Ram · Answer 1 · 2015-05-03

1

Entering edit mode

9.6 years ago

gtho123 ▴ 260

I don' think I quite understand what you are trying to do but from your example it seems like you want to insert the appropriate GeneX.AD value from the Genotype vector to immediately after the corresponding GeneX.GT element in the Names vector.

If this is the case you could use regular expressions and a loop like this:

ADs <- Genotype[grep("AD", Genotype)]

for(i in 1:length(ADs)){
  GT_loc <- grep(paste0("Gene", i), Names)
  Names <- c(Names[1:GT_loc], ADs[i], Names[-(1:GT_loc)])
}

Given your input vectors this creates your desired result. This will not be the most efficient way in R, especially if your sample size is large. However it does reproduce your example.

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by gtho123 ▴ 260

2

Entering edit mode

If you really want to insert immediately after GeneX.GT thethe corresponding GeneX.AD you can use also following approach. Should be faster than a loop.

Names <- c("cebi", "pithe", "Gene1.GT", "sapiens", "Gene2.GT", "calli", "Gene3.GT")
# Get Position of ".GT's"
id <- grep(".GT",Names)
# Create a index: old element gets rank, "AD's" gets half-rank
Seq <- c(seq_along(Names),id+0.5)
# Append AD's
Names <- append(Names,gsub("GT","AD",Names[id]))
# Order (AD's after GT's)
Names[order(Seq)]

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by Jimbou ▴ 960