Hi all
I'm having some trouble using RepeatMasker and was wondering if there's anyone out there that's used it before. Some of my sequencing files seem to have run just fine but the majority of files (the bigger ones especially) were clustered into their own separate folders with a lot of added results files that I can't find information about: .cat, .custom, .dat, .tmp.simple and of course a lot of .err files.
I'm getting the feeling that the masking of these files were not successful but why? Anyone seen this before? I'm using Illumina sequencing files converted to fasta format an they are round about 1G in size.
Help appreciated!
Are you trying to repeat mask short reads (e.g., stuff from a MiSeq or HiSeq) or have I misunderstood this? If I understood correctly, then you don't want to do that (post your goal and we'll give you an idea of what you actually do want to do).
Yes - I'm trying to repeatmask my HiSeq contigs...I take it that's a bad idea?
The problem is that I tried aligning my contigs to reference genome sequences using BWA but found that all aligned were the endless repeats and the unique sequences would simply be cut off or excluded - hence the attempt at repeat masking first.
Any recommendations would be appreciated!
How well repeatmasking contigs works will depend on how long they are. If they're closer in length to the original reads then you're largely wasting your time. You might try blasting a few of the non-repeat contigs and just see what they align to. My guess is that something just went wrong in your assembly.
BTW, the original problem sounds like it was just cause by repeatmasker crashing. Perhaps looking through the .err files (or the system logs) will elucidate why.
Thanks for the help Devon!