Hi all,
I am wondering if anyone has experience using this package for local realignment of reads? I am wondering how best to tweak it for speed. Can the standard parameters be used to get good gains in run speed or is parallelization the best route?
Thanks in advance.
So a best case scenario is approx 87 hours for a human genome? Is there a faster way of doing this when many samples are being used? For example, only considering the areas where variants are called rather than analysing the entire files?
Absolutely, SRMA allows you to focus the realignment in specific regions of the genome. That will reduce dramatically the running times.
What format are ranges specified in? I can't seem to figure it out.
Actually I can manage to run a range using RANGE=chr10:9271174-9271274 for example. But what if I just want to run an entire chromosome? Can that be easily specified? Also, is it acceptable to join multiple commands/ranges on the comman line using '&' so that each range uses a single thread?
It is acceptable assuming you have multiple cores on that machine. Storage is important also. If possible run your processes against local disk instead network storage.