FIX patch: A patch that corrects sequence or reduces an assembly gap in a given major release. FIX patch sequences are meant to be incorporated into the primary or existing alt-loci assembly units at the next major release, and their accessions will then be deprecated.
How to automatically apply this patches to assembled genomic sequences (i.e. to assembled chromosomes from GRCh37) to get latest known sequences?
To me, patches are just for a historical record, but not intended for practical use. If you integrate them into the primary assembly, all the coordinates will be shifted. Then your results cannot be easily compared to others. If you treat them as separate contigs, the massive redundancy with the primary assembly will lead to loss of information around patches, which is also problematic.
If you are worrying about the misassemblies in the reference genome messing up your analysis, you should really use decoy sequences.
ADD COMMENT
• link
updated 2.1 years ago by
Ram
44k
•
written 12.3 years ago by
lh3
33k
3
Entering edit mode
The decoy sequences will not help if there is a mis-assembly in the Primary. Rather, the decoy sequences are there to help decrease off-target alignments due to sequences missing from the Primary assembly.
It depends on how we define "help" and "misassembly". I know multiple examples where the reference genome collapses two or more copies of a sequence into one (I call this a misassembly), which lead to spurious variants. Decoy helps to fix many of them, the false positives. Also, about 90% of decoy sequences are fixed in the entire human population. Missing these sequences is also a type of misassembly. That said, I really appreciate GRC for the phenomenal works on the human reference genome. I know getting a good genome is really really hard.
ADD REPLY
• link
updated 2.1 years ago by
Ram
44k
•
written 12.3 years ago by
lh3
33k
1
Entering edit mode
I was not saying the decoy is not useful- I'm merely commenting on the fact that it doesn't fix the misassembly- it only helps by soaking up reads so that you don't get off target alignments. The FIX patches are actually meant to 'fix' these problems (although granted- all of them are not fixed)
Currently, it is challenging to use the patches. Tools are being developed to make better use of the data, but they are not quite ready for prime time. Jeremy and JC are correct- you can integrate the sequences into the assembly using the information the GRC distributes but you will create chromosome coordinates that don't exist anywhere beyond your computer. However, if this is just an analysis intermediate and you map features back to the native (that is scaffold) coordinates then you would be fine.
Note: there are two types of patches- the FIX patches are regions where there is a mis-assembly and the Primary assembly (i.e. the chromosome assembly) will change when GRCh38 is released next year. The NOVEL patches actually represent places where the underlying chromosome assembly seems to be correct, but an additionally allele that adds more sequence has been found.
There is an aligner that can use the assembly structure (that is the placement file that provides the correspondence of the patches/alts to the assembly) and provide alignments that don't get a lower mapping score: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/srprism
This aligner is not published yet, but a manuscript is in preparation.
Patches are not normally intended to be applied to the primary sequence, otherwise the GRC would have provided tools to easily do so. You can certainly include the patch sequences in your genomic index for the purposes of alignment, but you will be working off the grid, so to speak.
We are struggle to find out if we can use GRCh38.p4 for mapping and variant calling. Since the ABO gene (and few others) were misassembled and corrected in GRCh38.p1.
When I look into this problem I found that:
ABO gene sequence from the assembled chromosomes of GRCh38 and GRCh38.p4 are identical. So patch is not applied.
The patch is 149 base pair longer than the chromosome region that receive this patch. So I cannot use the patch to replace the mis-assembled ABO gene.
The decoy sequences will not help if there is a mis-assembly in the Primary. Rather, the decoy sequences are there to help decrease off-target alignments due to sequences missing from the Primary assembly.
It depends on how we define "help" and "misassembly". I know multiple examples where the reference genome collapses two or more copies of a sequence into one (I call this a misassembly), which lead to spurious variants. Decoy helps to fix many of them, the false positives. Also, about 90% of decoy sequences are fixed in the entire human population. Missing these sequences is also a type of misassembly. That said, I really appreciate GRC for the phenomenal works on the human reference genome. I know getting a good genome is really really hard.
I was not saying the decoy is not useful- I'm merely commenting on the fact that it doesn't fix the misassembly- it only helps by soaking up reads so that you don't get off target alignments. The FIX patches are actually meant to 'fix' these problems (although granted- all of them are not fixed)