I have fastq files that i want to make BAM files.
In GATK workflow of pre-processing, uBAM(unmapped bam)file is necessary because it have metadata.
Thus, i did
Fastq -> BWA - mapped BAM
Fastq -> Picard - uBAM
uBAM + mapped BAM -> Picard - Merge
However, i really don't know why this process is needed. Because we can add metadata to BAM with Picard(Addorreplacereadgroups) instead of using uBAM
i already read this article: https://gatkforums.broadinstitute.org/gatk/discussion/11694/why-is-converting-from-fastq-to-ubam-nesessary-before-preprocessing#latest
Thank you for reply.
could you explain example of metadata??.. i just thought it was like platform(illumina), library, Sample_NAMe...
but all of these is included AddorReplacegroups options.
Yes those are examples of metadata... but the issue here is that you are excluding the core of your data (i.e. nucleotide sequence) because of an underlying aspect of the bwa software. This is completely independent of any meta-deta.