Hi there,
Please can anyone help with this.
I am trying to make a fasta file, created in a Linux NGS pipeline more windows friendly.
I have a fasta file with several sequences in it generated by an NGS pipeline.
I'd really like to reformat the file doing several things, I currently do this manually with a text editor and want to speed things up. Im pretty new to Linux and scripts but guess this could be done much more efficiently. my fasta files looks a bit like this for each gene.
>gene-name-iter4\n
somesequence\n
somesequnece\n
somesequence\n
>gene2name-iter4\n
somesequence\n
somesequence\n
At the moment I remove all \n style line breaks then replace all > with \r\n> to put in windows style carriage returns replace all iter4 with iter4\r\n delete the one carriage return that is now before my first sequence, and add a carriage return to the end of the very last line.
Please can someone show me how to make this easier in Linux, I have access to centos virtual machine, but then move the output consensus fasta file from whole genome sequencing to windows computers.
Thank you James
before exporting, you can try
unix2dos
on linux machine.Hi, thanks for the reply. Just tried unix2dos unix2dos -l input.fasta
but that seems to change every \n to \r\n and there are \n at the end of several lines per sequence. Any idea if I should be choosing different options? Thanks
Windows currently has support for \n line endings. Plus, did you try
man unix2dos
?What have you tried? This is a fairly common problem, have you tried Googling?
Hi there, I have tried googling, which is where I sort of found sed or tr should be able to help. For removing all \n I have tried sed 's/\n//g' input.fasta > output.fasta but it doesn't seem to actually do anything, output file is created but still has \n linebreaks I also tried tr -d '\n' input.fasta > output.fasta but that gives me an error tr: extra operand
if you want to go sed way:
output:
input:
Hi thanks for the reply This doesn't seem to work for me,. Sed 's/\n//g' input.fasta > output.fasta All the \n are still there. But Sed 's/iter4/iter4\r\n/g' does work for adding carriage returns everywhere there is iter4
Did you
man sed
?sed
has an extended regex option, and unless you know exactly whichsed
you're using (pro-tip: mac/BSDsed
is the absolute worst), you will need to try a couple of flags, notably the-r
flag.