When running RepeatMasker locally, it runs for identify different types of repetitive elements. - identifying long interspersed repeats, tough LINE1s, Simple Repeats, ALUs , ancient repeats, retrovirus-like and so on .
I understand that are -alu (Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA) ) option, but and the others?
There are alternatives for identify only one or other type of repetitive element?
There are options to allow only masking interspersed repeats or simple repeats (listed below):
-nolow /-low
Does not mask low_complexity DNA or simple repeats
-noint /-int
Only masks low complex/simple repeats (no interspersed repeats)
-norna
Does not mask small RNA (pseudo) genes
In addition, you can always create your own library of repeats and pass that to repeatmasker with the -lib option, which may also be faster if you have specific repeats you are interested in finding.
I managed this by creating my own lib. To do this I used queryRepeatDatabase.pl at util RepeatMasker directory.
>perl queryRepeatDatabase.pl -help
queryRepeatDatabase.pl - 0.1
NAME
queryRepeatDatabase.pl - Query the RepeatMasker repeat database.
SYNOPSIS
queryRepeatDatabase.pl [-version] [-species <species> |
-stage <stage num> |
-class <class> |
-id <id>]
[-stat]
[-tree]
[-clade]
DESCRIPTION
A utility script to query the RepeatMasker repeat database.
The options are:
-version
Displays the version of the program
-species "species name"
The full name ( case insensitive ) of the species you would like to
search for in the database. This will return all the repeats which
would be used in a RepeatMasker search against this species. This
includes repeats contained in the clade given by "species name" and
ancestral repeats of "species name". Lastly ubiquitous sequences
such as RNAs and simple repeats are also included.
-clade
This will modify the default behaviour of the species option and
return only the repeats which are specific to your species or any of
it descendents. This is useful for identifying how rich the database
of repeats is for a given species/clade.
-stage <stage num>
The number of the RepeatMasker stage for which you would like
repeats. In the past these stages were individual libraries with the
following general names:
Stage Library
----- -------
0 species.lib
10 is.lib
15 rodspec.lib
20 humspec.lib
25 simple.lib
30 at.lib
35 sinecutlib
40 shortcutlib
45 cutlib
50 shortlib
55 longlib
60 mirs.lib
65 mir.lib
70 retrovirus.lib
75 l1.lib
-class <class>
Retrieve all elements of a particular class. For example:
DNA
SINE
LINE
LTR
Other
RC
Satellite
tRNA
Simple_repeat
Unknown
snRNA
-id <id>
Retrieve only a single id from the database.
-stat
Returns statistics on the sequences
-tree
Prints the taxonomy tree for all species in the database.
SEE ALSO
ReapeatMasker
COPYRIGHT
Copyright 2005-2011 Robert Hubley, Institute for Systems Biology
AUTHOR
Robert Hubley <rhubley@systemsbiology.org>
Optimizing -q and -pa options made it even faster :)
Thanks! Think build my own lib is good for me.