Matching Ensembl Ids In Input And Library File Using R
1
1
Entering edit mode
11.6 years ago
Diana ▴ 930

Hi everyone!

I have 2 files and I want to compare the Ensembl Gene Ids in my input file to Ensembl Gene Ids in the 2nd file (library file) and write those Ids from my input file that match to the library into a third file (output file) using R. My files look like this:

INPUT FILE:

sample_1    sample_2        log2.fold_change.    test_stat    p_value    q_value    significant    EnsemblGeneId        GeneName
Plac8-9     Plac11-12        0.610342            -3.34003    0.00083    0.02840    yes            ENSGALG00000006409    PODXL

LIBRARY FILE:

Ensembl Gene ID    Ensembl Transcript ID    Ensembl Protein ID    Associated Gene Name    Associated Gene Name
ENSGALG00000000168    ENSGALT00000000224    ENSGALP00000000223    AA1R_CHICK                AA1R

I would like the entire row of matched Ensembl ID from the input file to be written to the output file. Ive looked at match but it doesn't do what I want. Many thanks!!!

r Ensembl • 3.3k views
ADD COMMENT
2
Entering edit mode
11.6 years ago

I think match() would probably work just fine for this application, but I'll give a slightly more intuitive solution (untested). If your input file is in a data.frame called input_df and your library file is in a data.frame called lib_df, this should probably do it:

input_in_lib = input_df$EnsemblGeneID %in% lib_df[,1]
input_in_lib_df = input_df[input_in_lib,]

Now, input_in_lib_df should contain rows with overlapping Ensembl Gene IDs. This is untested, but I think it will get you there.

ADD COMMENT
1
Entering edit mode

Another variation:

input_df[which(input_df$EnsemblGeneId %in% lib_df$Ensembl.Gene.ID),]
ADD REPLY
1
Entering edit mode

Nicely shortened. You don't even need the which(), though, if you really want to go minimalist.

ADD REPLY
0
Entering edit mode

Thanks a lot!!! works like a charm!

ADD REPLY

Login before adding your answer.

Traffic: 767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6