If I wanted to expand a reference genome fasta file with alternate loci and decoy fasta files in order to get better read alignments, would it be enough with merging the files (with, say, cat *.fa > expanded_genonme.fa
) before starting the typical mapping workflow (samtools, picard, bwa ...), or there are additional steps to take care of?
Thanks! I have replied to that recent comment since it has more participants. Please feel free to comment there as well!
Neither what ATpoint nor I wrote, contradicts this tutorial in any way. Furthermore, I trust, it is a good one when written by the GATK team.
But in your original post, you stated that you are going to start the typical mapping workflow (samtools, picard, bwa ...), afterwards and alt-ware mapping is generally not considered to be the typical mapping workflow. Most aligners are by default not alternate contig aware (alt-aware) and will thus assign poor mapping quality scores to many reads due to erroneously considering them being multi-mappers.
Nothing wrong with alt contigs, as long as you are using the right tools and it's a deliberate choice.
Thanks! So regarding my original question, could I just merge the fasta files and follow that tutorial?
Yes, technically you can append any sequence to the reference followed by building a new index. If that makes sense or is appropriate is a different question.
My criteria for whether it makes sense would be comparing MAPQ values for alignments when using reference genome vs expanded genome. If there are fewer alignments with MAPQ = 0 in the latter, I would think it makes sense. I am way off here?