I have been given a couple of fastq files and I am evaluating what is the best tools to finally get a vcf (this is for learning activity).
At this point, I am going to do the alignment considering BWA-MEN vs Bowtie2 and GRCh38 vs GRCh37. After getting the resulting 4 bam files, I was wondering how I could evaluate the quality of my bam files. Of course, I will carry out in parallel the four pipelines and I will compare the variant called, but at this point, I would like to be able to compare the bam files as well.
For example, can I see the number of reads unmapped or any other quality control process that I don't know??
I am using galaxy.
best
is subjective . Unless you are planning to run four pipelines in parallel for every sample you process you won't know for sure. Something that seemsbest
for the test samples may not be for some future samples.As for genome build, there is little reason to use GRCh37 unless you have some legacy data to compare to. GRCh38 was released in Dec 2013 and is mature/stable.
Are you running your own Galaxy mirror? If not, you will not be in control of versions of software etc. if this is a long term project.
Thanks for your comments.
I am running a mirror designed by my university with a limited set of tools and I can do the run with these different versions. I think I am going to discard the GRCh37 because as you mention there are no real reasons to do that and keeping working with so many pipelines is a pain. What I wanted to do is to at least run these for runs and then compare results to support my criteria when deciding why I won't be using GRCh37.