Hi everyone, I'm interested in creating a blast db with the genomes of plants in Refseq Release.
I've downloaded all the genomic.fna files at the https://ftp.ncbi.nlm.nih.gov/refseq/release/plant/ link.
I know I should use makeblastdb to convert the fasta files in a proper plant-db but how can I mask those sequencies before making the db?
I've foud the tool windowmasker but I'm not sure I can use it on multi-fasta files that contain sequences from different species. I mean that a single file in the ftp repository can have many different species and not only sequencies from a single specific specie. Furthermore, different files can contain different sequencies from the same organism.
Thank you for your help
What kind of "masking" are you interested in doing? Repeats, specific sequences? Why do you want to do that?
I would like to mask over-represented sequences and also low complexity sequences. I want to do that in order to be sure that if I perform a classification of sequencies I dont get matches on redundant sequencies shared in different organisms genomes.