While I was experimenting with Genomic Data Commons (GDC) pipelines, I noticed that the reference genome that they have available for download is based on GRCh38 patch from 2013.
Would it be okay to append sequence decoys (hs38d1) and virus sequences to the latest GRCh38 in attempt to create an "updated" version of the reference genome used in GDC pipelines?
The sequence decoys and virus sequences shown on this page.
Also, NCBI says that hs38d1 from 2014 is the latest version of the decoys. Should I be looking for decoy sequences from somewhere else?
Thank you.
This is a good question. I am trying to figure out whether to use the latest GRCh38.p14 (release 2022), or the hs38d1 (decoy) for my analysis. To my understanding, decopy masked the telomeric sequences, and removed the alternative locus, added the EBV virus sequence. But trying to understand what are the benefits for using each version.