Hi Everyone,
I am working in the oncology field and I would like to test this pipeline .
GATK -Germline short variant discovery (SNPs + Indels)
Unfortunately I'm not able to find raw fastq data to test the pipeline.
Does GATK provide datasets to test their pipelines?
I found this article, but unfortunately I'm not able to download the data from the linked source. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243306/pdf/nihms531590.pdf
Thank you very much, I'm open to every advice about how to handle it. AY
You can also check their workshops which have a lot of info and examples: https://drive.google.com/drive/folders/1y7q0gJ-ohNDhKG85UTRTwW1Jkq4HJ5M3?usp=sharing
igor : Those workshops appear to be a year+ old going by date stamps. Do you know if they cover all the changes that have happened recently?
Some of them are from 2020 (use the directory name, not the modified date to gauge the age). I assume various events stopped in March due to the pandemic. GATK4 has been out for over 2 years and most of the tools have not changed much.
If it's a simple testing to see how easy the setup is and etc., you can try to look for .fastq files that are publicly available from SRA.
I'm trying this way but I found extremely difficult to find raw data for matched tumor-normal samples. Do you have any advice to search the SRA from NCBI??