Hi Guys,
I have these two dataframes, df1
and df2
. df1
with the alleles and df2
with the genotypes. There are more than 50 samples (1:50) comprised of Geno1.GT
, Geno1.AD
, Geno2.GT
, Geno2.AD
, ... Geno50.GT
, Geno50.AD
genotypes and depth coverages interleaving one after other. How do I get the columns matching only Geno and sample number (i.e skipping AD or GT extensions) (e.g., columns Geno1
, Geno1.GT
and Geno1.AD
together) and get the result table. Thank you.
df1
Geno1 Geno2 Geno3
A A A
C G C
C A G
df2
Geno1.GT Geno1.AD Geno2.GT Geno2.AD Geno3.GT Geno3.AD
0/0 22,3 0/0 33,2 0/0 33,3
0/0 2,0 0/1 22,3 1/1 43,33
0/1 55,45 0/0 32,2 1/1 22,3
Result
Geno1 Geno1.GT Geno1.AD Geno2 Geno2.GT Geno2.AD Geno3 Geno3.GT Geno3.AD
A 0/0 22,3 A 0/0 33,2 A 0/0 33,3
C 0/0 2,0 G 0/1 22,3 C 1/1 43,33
C 0/1 55,45 A 0/0 32,2 G 1/1 22,3
Are the rows of df1 and df2 matching and in the same order?
Thank you Sean for your reply. There are more colnames in df1 Geno1:Geno100 or more. So all the the columns in df2 are present in df1, but not the other way around. DF1 is bigger than df2 in samples and the order is also different.
Sean asked about the rows. If the rows are in the same order, merging is going to be much easier.
Sorry, Yes the rows are in same order and equal in length.