I am trying to find mutations from whole cancer exome sequencing data, right now I am following a guide at here : http://seqanswers.com/wiki/How-to/exome_analysis
I found that some of the commands ran quite slowly, especially FixMateInformation, which took more than 3 hours. Here is the command I used (incorporated in python)
os.system("java -Xmx4g -jar FixMateInformation INPUT=%s OUTPUT=%s SO=coordinate CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT" % (realigned, fixedRealign ))
Could you give me some suggestions for speeding up? How do you run you picard or other java scripts?
Another question is I want to find somatic mutations in cancer exome, while the guide I used is for SNP calling.
In my vies, the way to find SNP and somatic mutation are quite similar in principle, but I am more interesting in specific tools for mutation calling.
I found several high-rank papers employing mutect, which is an nonpublic tools by now.
what do you use for cancer mutation calling?
Thanks!
I maybe wrong, but I think its not possible for this command. Where ever possible, picard-tools has a "USE_THREADING=TRUE/FALSE" parameter that improves runtime by 20% or 30% etc... This one doesn't seem to have one.
only MergeSamFiles have "USE_THREADING=TRUE/FALSE", so sad
I believe that you can speed this process if you do not specify the sort order. In my experience, telling Picard tools to sort your output file significantly increases processing time. If you still need to sort, just sort before or after using samtools or Picard SortSam.
Thanks for your tips, I will try to sort it independently.