I have been quite busy for sometimes now using this package. No offend but I found it bloody useless at least from being straightforward if one wants to use it.
1- how one can ask for a help ? ( just google how to ask a question, you go to a page where there is nothing allowing you to ask any question, of course I registered a week ago). there are many many issues, one says use queue one says don't use it.
one says align , the other says don't need and I read few papers in high end journals where all used different platforms !!!
2- all their ridiculous post are old , useless and not clear
3- version 3 is different from version 4, yet both you must have java 1.8 (don't use any higher) if you use bwa or whatever, you cannot find a way to use GATK for alignment , I have been killing myself to get it to match with no success !!! if you know how, show me please
I even tried to use an older version but I cannot find a way to get it to work. Is there someone who could direct me to a good source for it? or show me how to report my errors on the GATK web?
I am even trying to use cloud now, if anyone knows how to do that I would appreciate any guide
Ok, it seems like I need to take some serious actions :-) I summaries steps I could find, now if you could please help me find each steps command so that we all go home in peace :-)
Step 1 ----> Get your data forward and revers
Step 2 ----> Map with BWA mem (don't do it with other BWA)
Step 3 ----> Sam to Bam (you can use whatever you like I use BWA)
Step 4 ----> Mark Duplicate reads (I use PICARD)
Step 5 ----> Uses samtools flagstat command to print descriptive information for a BAM dataset you generated in step 3
Step 6 ----> samtools mpileup multi-way pileup of variants
Step 7 ----> VarScan for variant detection
Step 8 ----> Annotate (You can use so many algithms, which one is the best? God KNOWS, maybe VCFannotateGenotypes I dont know)
Step 9 ----> You can filter your VCF data in a variety of attributes (is it necessary? God knows again)
Step 10 ---> ANNOVAR Annotate VCF
Step 11 ---> Go get a damn beer because you went through a lot
Thanks
Why is it complicated indeed?
It is a good question one that cuts to the very core of how science is practiced today. The short answers, in my opinion, are that there is no incentive and reward to make code simple, in addition a large number of people that use GATK directly complete with the organization that makes GATK. Since they already know how it works all the effort going into making it "simple" would be spent on the competition, a disincentive if there ever was one.
Think about the absurdity how the supposedly sophisticated GATK code will not run if you don't have a
.fai
file and a.dict
file (each of which would take seconds to generate and the extra time for generating those is a rounding error to how GATK operates) but nah, they rather produce an insane error message (20 seconds into the run) sending you to an outdated link that does not actually explain anything. They already know everyone has this problem, but instead of fixing it, you got to hit the manual, you got to hit the books to find out how to create those file.@Istvan Albert I appreciate your time , thanks. I personally think these tools came to make life easier not more complicated, what is the point of spending hours and hours and still thinking that you are doing something wrong :-) ? I just cannot believe the amount of NIH grant wasted on such stuff and people still need to crack a mystery :-( no cool at all ! I wish NIH could give me some :-D
Not sure how much NIH money went toward GATK, definitely >$1M http://grantome.com/grant/NIH/U01-HG006569-01 but maybe not for the initial development
I just today ran GATK, which explicitly told me that a .fai file was necessary (and a couple of minutes later reminded me that a .dict file is necessary). It would be useful if it could create those by itself, indeed.