Question

R for loop

0

Entering edit mode

8.7 years ago

Kritika ▴ 270

hello all i have one silly question related to R i have two files file1 :-

V1     V2       V3
exon  Rv0001  CCP42723
exon  Rv0002  CCP42724

total rows in file 1 is 4110

file2 :

id    Gene_name   
1     CCP42723 
2     CCP42724

total rows in file 2 is 4114

i want to match column V3 of file1 and column Gene_name of file2 and my output file should only contain column V2 of file1 . Means if my 1strow of file1 and 2 are matching it should give output of V2 first row of file2 in new file. my new file will only have V2 column of file1 which shows matching

i tried for loop but it seems i m going wrong file3$Gene_name <- 0 for (i in 1:4110) { file_new_id = file2[file2$V3 %in% file3[i,2],]$V2 } any help highly appreciated thanks

RNA-Seq R for-loop • 2.1k views

ADD COMMENT • link updated 8.7 years ago by PoGibas 5.1k • written 8.7 years ago by Kritika ▴ 270

1

Entering edit mode

Dont use for use merge then extract the require column

merge(x = data1, y = data2, by.x = 'V3', by.y = 'Gene_name')[, 'V2']

Or, import dplyr and use select

library(dplyr); select(data1, V2 %in% data2$Gene_name)$V2

ADD REPLY • link 8.7 years ago by russhh 5.7k

score 1 · Answer 1 · 2016-03-10

1

Entering edit mode

8.7 years ago

Sam ★ 4.8k

The easiest way will be

file1[file1$V3%in%file2$Gene_name,]$V2

ADD COMMENT • link 8.7 years ago by Sam ★ 4.8k

0

Entering edit mode

yes i tried this but i want to output this in new vector with this command i tried

file3$new_ID=file2[file2$V3 %in% file3$Gene_name,]$V2 but error

Error in $<-.data.frame(*tmp*, "new_ID", value = c(79L, 80L, 81L, : replacement has 4110 rows, data has 4114

so i need to run for loop 4110 times

ADD REPLY • link 8.7 years ago by Kritika ▴ 270

1

Entering edit mode

Well, if what you want is just the V2 from file 1, then there is no need to reassign to file1 data structure but just getting the vector out. The problem with what you did here is that if not all item in file1$V3 is found in file2$Gene_name, the resulting vector will have a different length. If you then want to assign this vector with say N items to file1 which has M rows, then R will complain. Also, it is rather confusing as you have this file1, file2 and file3 but you never mention what format you want your file3 to be

ADD REPLY • link 8.7 years ago by Sam ★ 4.8k

0

Entering edit mode

file4$new_id <- 0 for (i in 1:length(file3$Gene_name)) { file4$new_id <- file2[file2$V3 %in% file3[i,2],]$V2 } output > file4$new_id factor(0) 4109 Levels: EBG00000313313 EBG00000313314 EBG00000313315 EBG00000313316 ... Rv3924c

i should V2 columns of file 1 in my file4 with column name new_id which is matching in file2$gene_name and file1$V2

ADD REPLY • link 8.7 years ago by Kritika ▴ 270

2

Entering edit mode

You file names are getting out of hand... So if I understand correctly, you have 3 files, let's call them A B and C

For file A, it is of the following format V1 | V2 | V3 exon | Rv0001 | CCP42723 For file B id | Gene_name 1 | CCP42723

Then what is your file C's format? If you only want to output all V2 from A where the V3 of A is matched with Gene_name in B, you can just directly write it to a new file or data frame, so something like (assuming C is a new data structure, as you have not told us what it looks like)

C=data.frame(new_id=A[A$V3%in%B$Gene_name,]$V2)

ADD REPLY • link 8.7 years ago by Sam ★ 4.8k

0

Entering edit mode

thanks sam it worked !!!!!!!

ADD REPLY • link 8.7 years ago by Kritika ▴ 270

score 0 · Answer 2 · 2016-03-10

0

Entering edit mode

8.7 years ago

venu 7.1k

Not exactlt a bioinformatics problem. Try this simple UNIX command

awk 'FNR==NR {a[$1]; next}; $1 in a' file2.txt file1.txt | cut -d ' ' -f 3

ADD COMMENT • link 8.7 years ago by venu 7.1k