I have around 110 Fasta files. Each Fasta file is a multiple sequence file. I want to remove all ID's and make it as single sequence only file, without any ID's and without any gaps. I would prefer, any possible way with awk,grep or similar method.
I have around 110 Fasta files. Each Fasta file is a multiple sequence file. I want to remove all ID's and make it as single sequence only file, without any ID's and without any gaps. I would prefer, any possible way with awk,grep or similar method.
if you want to combine all the files into a single sequence:
$ grep -hv "^>" *.fa | paste -sd '\0' > final_seq.txt
However, if you want each multi-sequence file into a single sequence file, try this with gnu parallel. New files will have .text extension and without gaps:
$ ls *.fa
a.fa b.fa
$ tail -n+1 *.fa
==> a.fa <==
>a
at-gc
>b
atgc
==> b.fa <==
>b
tgc
>c
atgc
$ parallel "sed '/^>/d;s/-//' {} | paste -sd '\0' > {.}.txt" ::: *.fa
output:
$ tail -n+1 *.txt
==> a.txt <==
atgcatgc
==> b.txt <==
tgcatgc
for F in *.fa ; do grep -v '^>' $F | tr -d '\n \t-' > "${F}.txt" ; done
To remove all fasta IDs you can use sed command:
for i in *.fatsa; do sed -i '/>/d' $i; done
Keep in mind that this will do changes in the same files and will not create copies.
I assume you don't want to make single line sequence files? Right? To make single line sequence:
cat file.txt | tr -d '\n' > single_line.txt
This will remove all but the first header from the multi fasta, which isn't exactly what you requested but can be useful if you want to make a single sequence but retain the fasta format.
cat file.fasta | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
If you want to run it over all files, simply make a loop or a parallel call and replace file.fasta
with the relevant variable/string etc.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you test that code or am I doing sth. wrong?
Edit: This is a Mac problem. The
sed
on Mac expects an extension for a backup file so do: