Hello all
I am a beginner in bioinformatics. I have downloaded the complete nr database from the NCBI. it contains106785170 nr protein sequences altogether. the fasta header of every sequence start with fatsta symbol > followed by the accession number and other information. here is the examples
>WP_003131952.1 30S ribosomal protein S18 [Lactococcus lactis]NP_26834..........................
>XP_642131.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideu.............................
>XP_642837.1 hypothetical protein DDB_G0276911 [Dictyostelium d......................................
i want to add the accession number between pipe character "|" of every sequence in the header.
>|WP_003131952.1| 30S ribosomal protein S18 [Lactococcus lactis]NP_26834..........................
>|XP_642131.1| hypothetical protein DDB_G0277827 [Dictyostelium discoideu.............................
>|XP_642837.1|hypothetical protein DDB_G0276911 [Dictyostelium d......................................
kindly help me to solve this issue.
regards bilal
thanks for help ..... :)
Please check the green mark on the left to flag this question as answered.
bro, I got an error while preparing the blastable database after Rectify the fasta headers with -parse_seqids tag. without -parse_seqids all work well. actually, i am using Blast2Go software for mapping and annotation. here in this page How to create a Fasta file database for local Blast and to import XML results successfully into Blast2GO, they give the instruction about the header format.
here is the error volume: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi
file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.pin file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.phr file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psq file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psi file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psd file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.pog
BLAST Database creation error: Defline lacks a proper ID around line 380
this is the 380 line
>|CAD71090.1| conserved hypothetical protein [Neurospora crassa]