Hi
I had around 10,0000 gene sequences in individual fasta files. I'd like to rename each file with their header name containing the gene name.
Original file
head 1.fasta
==> 1.fasta <==
> Gloriosasuperba; 8324-9004; -; atp6
ATGACAGTAAGCCTTTTTGACCAATTTATGAGCCCCACACTACTAGGCATCCCCCTGCTC
Modified file
head atp6.fasta
==> atp6.fasta <==
> atp6
ATGACAGTAAGCCTTTTTGACCAATTTATGAGCCCCACACTACTAGGCATCCCCCTGCTC
I executed the awk in the directory, containing all the files.
I created dir with "out"
I slightly edited my initial reply since the direct redirect to a mix of string and field variable doesn't seem to work. So it is now assigned to a variable a for being written out.
But this doesn't explain the "No such file or directory" error, because it would have created a file with the name
$5.fasta
in the out directory instead. Did you create theout
folder in the parent directory or as a subfolder in the current directory? I am almost sure that this was the problem. Try again after runningmkdir -p ../out
.Thanks Matthias,
It generates a concatenated file ($5.fasta), that is true.
with a fixed string length
I'm looking for the whole sequence, how I do it ?
Further, the concatenated file needs to be splitted in to individual fasta files.
That is weird, because
tr
should call the corresponding command to replace the respective letters:But then let's go step by step and without using
tr
.Does
give you
and
returns only
5
?Yes, you're right.
If you only have entries with 5 fields and the contents of each FASTA file are now on one line, this should then give you the desired output:
If you have multiple sequences for the same gene (e.g. transcripts), then use
>> a
such that they are concatenated.In the cmd, what does the tr $4 and $5 represents for ?
$4
and$5
are the respective columns.should give you the file
bbbb
with the contentaaaa
and the filedddd
with the contentcccc
.So what I am attempting is: