Dear Friends, Hi (Sorry if this question is simple or duplicated)
I have two lists of IDs, list A.txt and list B.txt and B>A (A=blast results IDs and B= the IDs from the original EST database that the blast has run against it).
I want to compare them and collect the IDs that are present in the list B (bigger list) but absent in the list A (the ESTs that blast can not find any hit for them).
Please help me how to do it in linux command line (please no perl or python. thanks)
NOTE1: I usually do it as below, but I need some more strigth approach:
1- sort both lists
2- $ comm A.txt B.txt | cut –f2 > specific-to-B.txt
3- $ sed -i '/^\s*$/d' specific-to-B.txt (because this file contain many blank lines)
4- the list is ready
NOTE2: Example of list is as :
EV825482.1
EV825573.1
EV825616.1
EV825623.1
EV825663.1
EV825667.1
EV825673.1
EV825677.1
EV825680.1
from the
comm
manual :WHY ??
Dear Pierre Lindenbaum, Hi
When I use
comm -1 fileA fileB | wc -l
, it returned the fileB number (6088)When I use
comm -2 fileA fileB | wc -l
, it returned the fileA number (5699)When I use
comm -3 fileA fileB | wc -l
, it returned the number I want (389)So I think the "suppress column 3 (lines that appear in both files)" must be changed to "the lines that are in B and not in A"
Is it correct ?
~Je vous remercie infiniment
no, the correct way for ' present in the list B but absent in the list A ' would be: ('-1' remove lines unique to listA and '-3' lines that appear in A and B )
yes if there is no line unique to fileA.
But again, the correct way is 'comm -13'
OK Pierre,
that was the problem (or better to say, the "reason") as the fileA IDs (blast results) is exactly exist in listB (blast database !
You are right as usual, ;-)
~ bon courage
Thanks for the time and supports,
when I perform "comm -3 fileA fileB | wc -l" and "comm -13 fileA fileB" , both the results have 389 lines,
is it normal ?
Are those results identical?
Hi,
Good question
And the answer is positive.
Hi Farbod,
I know it's not what you're asking. But I do this exact thing in Galaxy all the time. Galaxy wraps common command line tools, but you have a nice graphical interface to work in and the results are really easy to visualise. There are a couple of text manipulation tools in there that you could do this with.
Thought I'd put a comment here just in case your interested :-)
Dear Ando.kelli,
Hi and thank you for your clever advice,
would you name some of the programs you have used for this purpose in Galaxy, please ?
Hi Farbod,
Sorry for the slow reply, I didn't get a notification saying that you responded to my comment.
If you install a local instance of Galaxy there are many text manipulation tools that are automatically included, and many that can be downloaded.
List of available tools can be found at the main Galaxy Toolshed: https://toolshed.g2.bx.psu.edu/
You can go to this site and browse tools. Alternatively, you can go to this site: https://usegalaxy.org/ and browse the list of options down the left hand side. I think the headings most relevant to you are: Text Manipulation, Convert Formats, Join Subtract Group, and Filter and Sort.
Hope that helps.
Kelli