Hi everyone, I have several ASCII files containing genes expressed in different experimental conditions ( apple.conditionA apple.conditionB apple.conditionC ) The first column of all of them conteins the gene name, the other colums have information like the chromosome where It is, the direction etc. I need to extract in a .txt file the gene names that are only expressed in condition A (apple.genesA) using LINUX commands.
Thanks in advance
Please be more specific. ASCII doesn’t narrow down what type of file you are trying to analyse, only the text encoding style.
Ok sorry, the files have all the same structure with several colums, I need to compare the first colum of all of them (where the gene names are) and then extract the unique genes of a concrete file.
You still haven’t told us what the files are.
Edit your question to include some example input data and the kind of structure you would like output.
Ok, I hope It´s clear now. Thank you for yor your time
It's not. Tell us how you obtained the files, which software was used and show an example.
We are not mind readers. How are we supposed to know what ‘condition A’ is, if you don’t show us the data? At the moment “is expressed in condition A” could be a Boolean, it could be some integer value, a floating point > some threshold?
If the data is confidential or something, you can make a mockup of the file which follows the same patterns with different context.
Please put more effort in else we will just close this post.
MDP0000303933 MDP0000303933 chr1 - 4276 5447
This is for instance the first line of the apple.conditionA file, on the first column we can see the gene name, the second column has de RNA read that was sequenced and asigned to the gene specified in colum one, the remaining columns give the chromosome, the direction of the gene, and It's coordinates.
All three files have the same structure, using linux command-lines, Is there a way to extract the unique genes expressed in the file apple.conditionA, comparing it to apple.conditionB and apple.conditionC?
Sorry for the vagueness of my questions but this is all very new to me, once again thank you for your help
So the question is you just want all the lines in the condition A files which are unique (i.e. not in file B and C), based on column 1?
Yes, exactly! Thanks
Please refer to my solution below. You can achieve this by
grep
-ing against A for all patterns not matching thecat
-ed first columns of B and C, which are gotten bycut
-ing files B and C.I think I have It,
This way I need to create two files and to use two lines, but I can't think of anything else.
Temporary files work fine, but if you wish to not use files, check out process substitution
is the same as
Also, please don't use the
command >file | command2
syntax. It maybe works now because your shell doesn't have MULTIOS enabled, but if you have MULTIOS enabled, it will pipecut -f1 apple.conditionA
to both the filecompare
as well as downstream tocut -f1 apple.conditionB
, mangling the output and introducing unpredictable bugs bordering on file corruption to your pipeline.Ok, so finally I got
I dindn´t know process Substitution structures existed, thank you very very much!!
Yep, its as simple as that!