how to find gene IDs from sequences
1
0
Entering edit mode
9.5 years ago

Dear all,

I have a file including thousands of sequences for a special plant species.

I want to obtain their corresponding gene IDs. I know that the best way of doing this is to perform blast, but since the number of sequences is huge, I am looking for a way to do this automatically.

Do you think all against all blast is a good way? if so can you please give me a clue to do it?

Thank you in advance

Nazanin

sequence GeneID • 3.2k views
ADD COMMENT
1
Entering edit mode

Hi,

I see you can run quite easily blast for several sequences at one; http://www.ncbi.nlm.nih.gov/guide/howto/submit-mult-seq-blast/ I'm not sure how scalable this is

You can also limit your search to specific organisms to find just matches for plants.

ADD REPLY
0
Entering edit mode
9.5 years ago
tomc ▴ 90

Assuming you are working with nucleotides, have a local blast installation and your gene reference sequences formatted as a blast database, This could be start of a solution to creating links between your sequences to and known gene IDs.

blastn -db reference.bdb -query file_with all sequences.nt -out result.alignment

But then you will still need to tune your "expect", pick an output format, and eventually interpret the resulting alignments which unless you have a fairly arbitrary policy, is going to be the real work.

ADD COMMENT

Login before adding your answer.

Traffic: 3025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6