Hi all,
I have two files (a.txt; b.txt), columns were separate by tab,
a.txt:
id value1 value2 value3
g1 v1 v2 good
g2 v3 v4 better
b.txt:
id value1 value2
g1 v1 v2
g3 v3 v4
I want to use id as the keywords on use perl to search the same id in the two files,
if id in a==id in b, like g1
then append value3 of g1 in a.txt to the end of line g1 in b.txt
I want to use this method to deal with my GO-gene association file and gff files, and I am just beginning to use Perl. I know it is not a good manner to ask for scripts, so just me some hints on where to begin. Any suggestions are appreciated!
Thank you for your reply! I have a look at your profile, and found, wow, I have introduced several of your paper in my lab journal club about Verrucomicrobia!
thank you, those are not exactly mine, but our team effort, I just performed some data processing
another questions, how could I keep the rows not include in the merge file, I mean data (a-c) and (b-c)? Thank you!
I read Phil Spector's book Data Manipulation with R, I found 'merge' has several parameters which I could use of! Thanks!
RStudio is a nice way to keep things tidy. If in R you type
?merge
it will show you detailed help about the command.Thanks you for this tip!
Merging the two files and then parsing the results for this task is not a good solution--in any language. Also, your comments about Perl are rather puzzling.
Let me step back a bit. I think that Intersection) is one of the standard (basic) operations from sets theory, and can be handled in a variety of ways in any language. Not handling this, is the "suicide" for any language. R is interactive, Perl is not. edit: biostars doesnt handle that wikipedia link, sorry.
Add only what you need to add--and nothing more. Adding the informational 'noise' by merging everything subsequently required removing that 'noise.' I still have no idea what you mean by "R is interactive, Perl is not."
By default, R starts interactively, while Perl is not really designed that way. I might be wrong, but merging (joining) two sets in a deterministic way requires some sort of indexing or sorting and is computationally expensive probably not better than O(nlogn), unless you are ok with some errors. In my solution I read everything in the memory, but merge operation took care about sorting and selection.