curating a viridiplantae database
1
0
Entering edit mode
8.7 years ago
Biogeek ▴ 470

Dear all,

I have downloaded the latest Nr.gz file from NCBI and unzipped it. Now I want to only obtain the viridiplantae sequences from this Nr fasta file ONLY. I have tried downloading all the GI numbers for the plant protein sequences and doing a grep as follows.

grep -wFf GIsequences-list NR > viridiplantae.fasta

I however don't get any protein sequences in the output file. Just GI numbers and annotations.

Is there a script which can do better? or a command which I can use to get my so wanted viridplantae Nr database. I am using RAPSEARCH for speed rather than BlastX, so I can't supply the blastx command to search for taxonomic specific annotations.

Thanks.

blast sequences annotation • 2.3k views
ADD COMMENT
0
Entering edit mode
8.7 years ago
untitpoi ▴ 30

Hi, I think using grep -A option could help you. It permits to get not only the line which match your pattern but also a number of line after it. Tough it is not the best solution if your fasta is not monoline which is often the case.

ADD COMMENT

Login before adding your answer.

Traffic: 2376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6