I have two dataframe.
One is vcf. Its content is : **
head(vcf)
X.CHROM POS ID CHROM_POS
1 chr1 100000421 rs1047982323 chr1_100000421
2 chr1 100000827 rs1375386196 chr1_100000827
3 chr1 100001753 rs866745787 chr1_100001753
4 chr1 100001904 rs1416462966 chr1_100001904
5 chr1 100002334 rs1220478954 chr1_100002334
6 chr1 100002490 rs181634796 chr1_100002490**
and the other is mashr. Its content is:
head(mashr)
RSID1 RSID2_
1 chr1_169894240 chr1_169894240
2 chr1_169894240 chr1_169891332
3 chr1_169891332 chr1_169891332
4 chr1_169661963 chr1_169661963
5 chr1_169661963 chr1_169697456
6 chr1_169697456 chr1_169697456
I want to count the number of matches between these two dataframe in terms of chr_pos. and number of chr_pos in vcf dataframe missed in mash. I wrote this command:
which(vcf$CHROM_POS == mashr$RSID1)
but its showing error: the error:
integer(0)
Warning message:
In vcf$CHROM_POS == mashr$RSID1 :
longer object length is not a multiple of shorter object length
I know that this error is related to the fact that the length is varying.
Can anyone tell me how to do this.
I want to find the number of similar chrom_pos
between the two dataframe and chrom_pos
missed between the two dataframe
Use
merge
or dplyr's join functions. Google should get you started on basic tutorials for these.