1) For the alignment of the human whole exome sequence FASTq data to the human GRCh38p14 reference genome, and using the Isaac aligner V4, do you recommend:
unmasked genome reference ?
masked genome reference ?
reference with hs38d1 decoy sequence included ?
2) Please define how the human reference genome sequences should be masked for the best use with Isaac aligner V4 and human exome FASTq data ?
is it a soft mask ?
is it a hard mask ?
3) Please define how the human reference genome sequences should be masked for the best use with Isaac aligner V4 and human whole genome FASTq data ?
is it a soft mask ?
is it a hard mask ?
4) For the alignment of the human whole genome sequence FASTq data to the human GRCh38p14 reference genome, and using the Isaac aligner V4, do you recommend:
unmasked genome reference ?
masked genome reference ?
reference with hs38d1 decoy sequence included ?
5) To mark and remove PCR duplicates from the BAM file generated from either human exome or human whole genome FASTA, what parameters are set with Isaac aligner V4: Option A: --keep-duplicates 1 or --keep-duplicates 0 ....which one ?
OR Option 2: --mark-duplicates 1 or --mark-duplicates 0 ....which one ?
just curious: why using isaac ?
Isaac aligner is very fast compared to other aligners. The accuracy is comparably about the same as BWA-MEM.
That is true. However, Isaac's memory usage is over a dozen times higher (which isn't a problem given that a cheap modern refurbished server will give you >64 gb RAM).
Bwa is more widely used so you might have more community support if you use that.
there is no such thing as p14 in the context of alignment to a reference genome (the patches play no role in that), but the decoy definitely is a thing. I think the decoy is a good idea, and masking is not. Mark duplicates.