get calls in R
1
0
Entering edit mode
6.0 years ago
HG ▴ 30

Hi all, I have two dataframes: First data frame has three columns: ID, sire.ID, and Dam.ID

The second dataframe has the same three columns plus genotyping calls for 17 markers in 17 columns.

Here is what I need to be done in R: If sire.ID in the first dataframe exists in the ID column in the second dataframe, then return the sire.ID and its calls from the second dataframe in first dataframe for the relevant offspring, and same for Dam.ID. Simply, for each ID in the first dataframe I want to get the calls for its sire and dam in front of the ID and in two rows. Thanks in advance for your help

R • 1.6k views
ADD COMMENT
1
Entering edit mode

I have changed your post to "question", a tutorial is when you show how to do stuff.

Your question seems rather R instead of bioinformatics. It is likely someone will close your post for this reason...

ADD REPLY
1
Entering edit mode

Hi @HG, The community is not here to provide tailored programming solutions/ tutorials. Please amend your post to show some reproducible data to your question, along with code showing what you've tried.

ADD REPLY
1
Entering edit mode

Good description of data. But please post some example data and expected output. HG

ADD REPLY
0
Entering edit mode

Here is part of the first dataframe:

ID  Sire.ID Dam.ID
A       D              G
B        E          H
C        F              I

part of the Second dataframe:

ID  Sire.ID Dam.ID  marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
D   j   K   12  21  11  11  12  22  11  22  11
G   l   m   12  21  11  11  12  22  11  22  11
H   n   o   12  21  11  11  12  22  11  22  11

I need this:

ID  Sire.ID Dam.ID                                              
A   D   G   D   j   K   12  21  11  11  12  22  11  22  11
A   D   G   G   l   m   12  21  11  11  12  22  11  22  11

Split each ID (each row) in the first dataframe into two rows and return the sire and dam calls from the second data frame in first and second splitted rows for each ID. If the information for the sire or dam doesn't exists in the second dataframe then write NA in the splitted rows.

Thanks

ADD REPLY
1
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
2
Entering edit mode
6.0 years ago

Your expected output seems to be incorrect. G and H from Dam.ID of data frame 1 matches with that from G and H from ID of Second frame. code:

> library(sqldf)
> a=read.csv("testa.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> b=read.csv("testb.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> sqldf ('SELECT * FROM a INNER JOIN b ON a."Sire.ID" = b.ID OR a."Dam.ID" = b.ID')

output:

ID Sire.ID Dam.ID ID..4 Sire.ID..5 Dam.ID..6 marker1 marker2 marker3 marker4 marker5 marker6 marker7
1  A       D      G     D          j         K      12      21      11      11      12      22      11
2  A       D      G     G          l         m      12      21      11      11      12      22      11
3  B       E      H     H          n         o      12      21      11      11      12      22      11
  marker8 marker9
1      22      11
2      22      11
3      22      11

input:

> a
  ID Sire.ID Dam.ID
1  A       D      G
2  B       E      H
3  C       F      I
> b
  ID Sire.ID Dam.ID marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
1  D       j      K      12      21      11      11      12      22      11      22      11
2  G       l      m      12      21      11      11      12      22      11      22      11
3  H       n      o      12      21      11      11      12      22      11      22      11
ADD COMMENT
0
Entering edit mode

Thank you very much for your reply and the correction for the data. Your code worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 1878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6