Dear all.
I have a big Fasta file with complicated name of the sequence as:
>scaffold:ChrPicBel3.0.1:JH584390.1:1:2133925:1 scaffold JH584390.1
GAAATGCTCTTTTCTTCATTTAACCTTATATTTAATACACCTTTTAAATGTTTCTCAATT
TTTTTATTCTTTAATAATATGACAAACTAGACCTTTAAAATCATCTCTCCTTCCTAAATC
I just want to keep the last letter as the name as:
>JH584390.1
GAAATGCTCTTTTCTTCATTTAACCTTATATTTAATACACCTTTTAAATGTTTCTCAATT
TTTTTATTCTTTAATAATATGACAAACTAGACCTTTAAAATCATCTCTCCTTCCTAAATC
Please give me some suggestion. thanks
ZQ
You might want to look into using Biopython. See these pages:
http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial002.html
http://biopython.org/wiki/SeqIO
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11
http://biopython.org/DIST/docs/api/Bio.SeqIO-module.html
It just so happens that I've posted some code examples using this package for fasta parsing here and here which might be helpful to get started; you might want to modify the
record.id
value with.split()
(example) or a regular expression of some sort (docs here and here) and append the output to a new fasta file.