Hello,
I have never made a script in my life.
The ploblem is how to change the fasta names like this input file:
>Glyma04g14800|Glyma04g14800.3
MMLETVAAVPGMVAGMLLHCKSLRRFEHSGGWIKALLEEAENERMHLMTFMEVAKPKWYE
>Glyma05g24460|Glyma05g24460.1
SNVSIDLTKHHVPKNFLDKVAYRTVKLLRIPTDLFFKRRYGCRAMMLETVAAVPGMVGGM
in this output file (change original names to numbers in ascending order, starting with 1):
>1
MMLETVAAVPGMVAGMLLHCKSLRRFEHSGGWIKALLEEAENERMHLMTFMEVAKPKWYE
>2
SNVSIDLTKHHVPKNFLDKVAYRTVKLLRIPTDLFFKRRYGCRAMMLETVAAVPGMVGGM
I'm so grateful for helping. Regards, Naka
Sir can we modify above awk syntax in this way
instead of printing like
it prints like
for that purpose, where and how do I put the text "chromosome"
please help me out
@Raghav: If you wanted to add chromosome in the header with the counter, simply add it in the ">" portion of the one-liner.
How can we add "chr" just after >? I don't want to change anything else. For example:
Hello there, (I already solved this)
I am trying to understand your script line to modified. So, is the script saying?:
For every line ('/) where you find a > (^>/) print the > and then add (+) a counter (+), then next print what follows.
In my case the names are like:
etc.
I want to leave only what is different.
And can I do this in ssh? ( I don't think I have awk installed)
Many thanks in advance for your time,
Caro
PS: I am new to HTS/NGS and don't know much about programming
This doesn't work if the read spans in multiple lines ?
@Pierre Lindenbaum
Hi,
how can I modify this command to add genus_species name after > in every entry and yet keep most of the information in the the header
ie. my entries are like this
and want to have the entries name like this
By using
I got,
and so on
and how can I add output file in the command line
Having the
genus_species
name in the beginning is requires as I'll be comparing different species and also, I don't want to loose the ids and protein names for ease of downstream analysis.Hello Pierre, Thank you for your useful code. May I please ask how can I modify the code to keep everything else in the sequence and just to add the sample name in front and that too for the batch of files.
e.g. my file looks like :
ACTGGGTGTAAAGGGCGTGTAGGCGGAGAAGCAAGTCAGAAGTGAAATCCATGGGCTTAACCCATGAACTGCTTTTGAAACTGTTTCCCTTGAGTATCGGAGAGGCAGGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACGACAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCGGT
Now I want to add Sample name after > and keep everything else as it it.
This process I want to do for a batch of files. Any help will be really great. Thanks, Mitra
Hello Pierre, when I use awk '/^>/{print ">" ++i; next}{print}' < file.fasta, the changes are made but not saved. I want to distinguish between two numbered contig.fasta files (each fasta is numbered 'contig 00001', 'contig00002' etc, I want to name the 1st contig.fasta 'shorter_contig 00001, 'shorter_contig 00002' and the 2nd.contig.fasta 'longer_contig 00001, 'longer_contig 00002' ) is there a way make the header modifications permanent? Thanks
http://wiki.bash-hackers.org/howto/redirection_tutorial
Hello, i want to change the fasta name of this input file :
In this fasta name:
what script do I have to use??
With respect, if you are already stuck at this most simple task, better spend some quality time on Unix and NGS basics before diving into any analysis. In the end, you as the analyst have to stand up for your analysis.
A similar question+answer was posted above. What are you missing ?
my question was resolved, thank you. I needed it in this manner because i have other script that only works with these fasta names . thanks.