Your attempt is kind of close. You may want to try the following command. The parts of the pipeline are separated on different lines to make the whole easier to read. The \characters inform the terminal that the command continues on the next line.
cut -d " " -f 1 where -d " "specifies that the delimiter is the space. This removes anything after the first space of the line.
perl -pe is used mostly like sed -e, but sometimes I find it better to use perl, so rather than learning both sed and perl, I suggest learning only perl.
's/>/_newline_>/ adds a unique string to recreate the lines later
's/\n/\t/'replaces the newlines by tabs. At this point, the whole file is only one line.
perl -pe 's/_newline_//' removes the first occurence of _newline_ in the file to avoid starting the file with an empty line later.
perl -pe 's/_newline_/\n/g' changes the _newline_ string with a new line.
perl -pe 's/\t$//' removes tabulations at the end of the lines.
In this example, I use pipes (|) a few times at places that may not be evident. Perl treats the file, or the input it gets through a pipe, one line at a time, as delimited by a new line character(\n or some such). Thus, for example, when I remove all the new line characters at step 4, I create one long line and must use a pipe so that the next transformation can be applied to the whole file, not only the line that is currently being treated. This permits the trick in item 5 where I only remove the first occurrence or _newline_ in the whole file, which is now on one line.
As a side note, I edited your comment to use full English. Could u pls is just as easy to write as Could you please. The latter is more polite and is also more pleasant to read for a person who spent a few minutes to help you and future users ;)
Yes, you are right. As a perl beginer, I think I really learned something about this language, especially the three s/>/_newline_ / commands, which look alike but differs, in your code. THANK YOU.
_newline_ is just a string I decided to use to mark the positions where I will later put a \n back. I could just as well have used any string, like INSERT_NEWLINE_HERE :)
No, it will not. If the sequences span multiple lines, one may first linearize the fasta file (each sequence is written on one line). This can be done with Awk too: awk 'NR==1 {print ; next} {printf (/^>/) ? "\n"$0"\n" : $1}' file.fas
Hi Eric, Could you please briefly tell me what's the difference between _newline_ and \n? Thanks a lot!
As a side note, I edited your comment to use full English.
Could u pls
is just as easy to write asCould you please
. The latter is more polite and is also more pleasant to read for a person who spent a few minutes to help you and future users ;)Yes, you are right. As a perl beginer, I think I really learned something about this language, especially the three
s/>/_newline_ /
commands, which look alike but differs, in your code. THANK YOU._newline_
is just a string I decided to use to mark the positions where I will later put a\n
back. I could just as well have used any string, likeINSERT_NEWLINE_HERE
:)Really helpful and informative, THANKS!!