Question

get calls in R

0

Entering edit mode

6.2 years ago

HG ▴ 30

Hi all, I have two dataframes: First data frame has three columns: ID, sire.ID, and Dam.ID

The second dataframe has the same three columns plus genotyping calls for 17 markers in 17 columns.

Here is what I need to be done in R: If sire.ID in the first dataframe exists in the ID column in the second dataframe, then return the sire.ID and its calls from the second dataframe in first dataframe for the relevant offspring, and same for Dam.ID. Simply, for each ID in the first dataframe I want to get the calls for its sire and dam in front of the ID and in two rows. Thanks in advance for your help

R • 1.7k views

ADD COMMENT • link 6.2 years ago by HG ▴ 30

1

Entering edit mode

I have changed your post to "question", a tutorial is when you show how to do stuff.

Your question seems rather R instead of bioinformatics. It is likely someone will close your post for this reason...

ADD REPLY • link 6.2 years ago by Benn 8.4k

1

Entering edit mode

Hi @HG, The community is not here to provide tailored programming solutions/ tutorials. Please amend your post to show some reproducible data to your question, along with code showing what you've tried.

ADD REPLY • link 6.2 years ago by andrew.j.skelton73 6.6k

1

Entering edit mode

Good description of data. But please post some example data and expected output. HG

ADD REPLY • link 6.2 years ago by cpad0112 21k

0

Entering edit mode

Here is part of the first dataframe:

ID  Sire.ID Dam.ID
A       D              G
B        E          H
C        F              I

part of the Second dataframe:

ID  Sire.ID Dam.ID  marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
D   j   K   12  21  11  11  12  22  11  22  11
G   l   m   12  21  11  11  12  22  11  22  11
H   n   o   12  21  11  11  12  22  11  22  11

I need this:

ID  Sire.ID Dam.ID                                              
A   D   G   D   j   K   12  21  11  11  12  22  11  22  11
A   D   G   G   l   m   12  21  11  11  12  22  11  22  11

Split each ID (each row) in the first dataframe into two rows and return the sire and dam calls from the second data frame in first and second splitted rows for each ID. If the information for the sire or dam doesn't exists in the second dataframe then write NA in the splitted rows.

Thanks

ADD REPLY • link updated 6.2 years ago by WouterDeCoster 47k • written 6.2 years ago by HG ▴ 30

1

Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

score 2 · Answer 1 · 2018-12-25

Your expected output seems to be incorrect. G and H from Dam.ID of data frame 1 matches with that from G and H from ID of Second frame. code:

> library(sqldf)
> a=read.csv("testa.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> b=read.csv("testb.txt", sep="\t", strip.white = T, stringsAsFactors = F)
> sqldf ('SELECT * FROM a INNER JOIN b ON a."Sire.ID" = b.ID OR a."Dam.ID" = b.ID')

output:

ID Sire.ID Dam.ID ID..4 Sire.ID..5 Dam.ID..6 marker1 marker2 marker3 marker4 marker5 marker6 marker7
1  A       D      G     D          j         K      12      21      11      11      12      22      11
2  A       D      G     G          l         m      12      21      11      11      12      22      11
3  B       E      H     H          n         o      12      21      11      11      12      22      11
  marker8 marker9
1      22      11
2      22      11
3      22      11

input:

> a
  ID Sire.ID Dam.ID
1  A       D      G
2  B       E      H
3  C       F      I
> b
  ID Sire.ID Dam.ID marker1 marker2 marker3 marker4 marker5 marker6 marker7 marker8 marker9
1  D       j      K      12      21      11      11      12      22      11      22      11
2  G       l      m      12      21      11      11      12      22      11      22      11
3  H       n      o      12      21      11      11      12      22      11      22      11