GATK ver. : 4.1.4.1 Picard ver. : 2.21.4 samtools ver. : 1.10
Hello,
I'm learning to create a pipeline for variant calling. I started with an arbritrary chosen exome from 1000genomes in form of two FASTQ files.
I pre-processed the data using the GATK Best Practice workflow https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery
And ended-up with a supposedly "analysis-ready" bam file.
Since 1000genomes also provides a .cram file (aswell as a .cram.crai and a .bam.bas). How would I be able to compare my file with what is provided? I converted the .cram into a .bam file and I'm looking for a way to compare the two.
Next, for the variant calling, 1000genomes provides a .vcf file for each chromosome. How can I know wich type of variant calling was done? (SNP, SNV, Indels, CNV, ... ) Would I be able to check the validity of my .vcf result?
Any help would be appreciated, don't hesitate to ask for more informations.
Thank you in advance,
Maxime
Hi Maxime,
I strongly suggest you edit your post to make it notably shorter. Most users (including myself) will not read through such a long post to even understand what your problem is. Please try to work out the core part of your question while providing the minimal necessary information to understand it. This is of course a suggestion and you are free to decide if you follow it or not, but I think shortening your post will increase your chance of a good response.
Hi, Thank you for your consideration, I actually though that it might be too long. Isn't there a way to make some sort of spoiler section to add information without overloading the post?
Unfortunately not :(