i wanna extract all the sequences ID from a fasta file (Drosophila melanogaster chromatin remodeling factors).
fasta file content:
>FBpp0079251 FBgn0003607 symbol:Su(var)205 family:Chromatin Remodeling Factors species:Drosophila melanogaster
MGKKIDNPESSAKVSDAEEEEEEYAVEKIIDRRVRKGKVEYYLKWKGYPETENTWEPENN
LDCQDLIQQYEASRKDEEKSAASKKDRPSSSAKAKETQGRASSSTSTASKRKSEEPTAPS
GNKSKRTTDAEQDTIPVSGSTGFDRGLEAEKILGASDNNGRLTFLIQFKGVDQAEMVPSS
VANEKIPRMVIHFYEERLSWYSDNED
>FBpp0079496 FBgn0032157 symbol:Etl1 family:Chromatin Remodeling Factors species:Drosophila melanogaster
MSDSTVAASASASASSSAKSSLSDLRQFRINKNASSVVASPSRTERVPGKKRIQVMADSD
SDGNDSQTPKKTKLELTVKEKEERYMAAAKISPHFDTMAIQESLSRTNWDVAASVRYLRE
NCKPKGHNGPLAKSKLKPRSNGISGGNFSDNDHSDDDDVKQSKDQVYDSDDSDSEMSTKM
TGQRKKVFQFMNEASLIELQSVKTLSEKKALAIIDVRPFSDWSDLRQKLESIRMSGDLLN
YAQELINKQNTVAAILSKCNNMVSRLEKAISNGAGIVEQPKLLSSGLQLADYQIIGLNWL
TVMHKQEMNGILADEMGL
how can i extract all the IDs from the file in the Linux terminal(command)?
That will extract the entire header line, not the ID.
Well what is your expected output? What are the IDs?
Thank you for the commend.
IDs in the texts above are the lines which contain '>' sight, e.g.:
You can say its the id of that gene or polypeptide.
well then I have to agree with poisonAlien's solution. From what you told me you just want the fasta headers:
Please search this site, similar questions have been asked and answered multiple times.
Oops ! sorry. Should have read it carefully.