I am trying to map chromosome positions with all the gene IDs available.
The position file1 I have looks like this
#CHROM POSID REF ALT
NC_008596.1 41669 . CG C
NC_008596.1 78925 . GC G
NC_008596.1 101262 . GC G
NC_008596.1 115332 . GC G
NC_008596.1 224262 . A AC
and the genome position file2 looks like the following
ChrM Start Stop Strand GeneID
chr NC_008596.1 499 1692 + 4535615
chr NC_008596.1 1721 2614 + 4536178
chr NC_008596.1 2624 3778 + 4532650
chr NC_008596.1 3775 4359 + 4537198
chr NC_008596.1 4591 6618 + 4531510
chr NC_008596.1 6648 9176 + 4535988
chr NC_008596.1 9229 10011 + 4533293
chr NC_008596.1 10184 10276 + 4535541
chr NC_008596.1 50411 11211 + 4531499
Now I need to map the positions in file1 (POS ID) with the GeneID given in file2.
The POS ID in file 1 is the number which lies between start and stop codon number. For e.g.41669 lies between
chr NC_008596.1 50411 11211 + 4531499
and the output should be
chr NC_008596.1 50411 11211 + 4531499 41669
the last one is POSID
Please let me know how can I do it? I can work in R, so any solution in R or an excel trick would be helpful.
did you look at merge in R? https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
to use merge, I think we need to have common column in both the files. Here, POS ID in file1 is the number which lies between Start(499) and Stop(1692) in file 2.
So the POS ID of file 1 should lie in between the POS of file 2? What is the criteria of merge?
I just updated with an example
have a look to reduce in R?
data<-Reduce(function(...) merge(..., all=TRUE, by=c("POSID", "GeneID")), list(file1, file2))