Hi. I feel like this might be a stupid question, but I can't seem to find an answer anywhere. I'm currently trying to align my paired-end reads to a reference genome. To do this, I believe I need to create a ".sam" file using "bwa".
However, what I don't understand is, should I create one .sam file that contains all of my paired-end samples? That is, I have around 100 samples that are paried-end (so, in total, 200 files). Should I align all of these samples to the reference genome "in one go" so that I have one large .sam file containg all my 100 samples? Or should I create 100x .sam files where I've aligned each sample separately?
My goal later is to find SNPs of these samples.
Thanks!
You can use
read groups
to include multiple samples in one alignment file (see: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups ). If you are planning to use GATK for SNP calling then this may indeed be a requirement.