Making masked blast db
1
0
Entering edit mode
19 months ago

Hi everyone, I'm interested in creating a blast db with the genomes of plants in Refseq Release.

I've downloaded all the genomic.fna files at the https://ftp.ncbi.nlm.nih.gov/refseq/release/plant/ link.

I know I should use makeblastdb to convert the fasta files in a proper plant-db but how can I mask those sequencies before making the db?

I've foud the tool windowmasker but I'm not sure I can use it on multi-fasta files that contain sequences from different species. I mean that a single file in the ftp repository can have many different species and not only sequencies from a single specific specie. Furthermore, different files can contain different sequencies from the same organism.

Thank you for your help

db blast masking • 853 views
ADD COMMENT
0
Entering edit mode

What kind of "masking" are you interested in doing? Repeats, specific sequences? Why do you want to do that?

ADD REPLY
0
Entering edit mode

I would like to mask over-represented sequences and also low complexity sequences. I want to do that in order to be sure that if I perform a classification of sequencies I dont get matches on redundant sequencies shared in different organisms genomes.

ADD REPLY
0
Entering edit mode
19 months ago

I would give repeatmasker a try. That tool will be able to mask low complexity regions as is and you can provide it with a custom 'repeat' dataset/sequences that you then can use to mask fasta files. In that repeat dataset you can put whatever sequence you wish to see masked.

Do take GenoMax comment in mind and re-assure yourself that this is the way to go.

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6