Remap GRCh38.p13 to GRCh38
2
0
Entering edit mode
3.6 years ago
schmau ▴ 10

Hi,

i'm interested in a few Loci located in the hard masked regions of hg38 (like 13p, 14p, 15p) and i mapped my data to GRCh38.p13 and i get spots in the included Haplotypes (like KI270715.1, KN196477.1).

Does anyone know a way to map those to hg38 for creating Heatmaps, ideogram, annotation, ... NCBI Remap just copies those into a new file with the exact same chromosome and region....

haplotype bed genome freeze map • 2.6k views
ADD COMMENT
2
Entering edit mode

You have features mapped to KI270715.1 and would like to map them to somewhere on the chromosome? That won't be possible because KI270715.1 is an unlocalized-scaffold that's present in GRCh38 as well as GRCh38.p13 (and everything in between).

ADD REPLY
0
Entering edit mode

That was quite a bad example, but some of the contigs are "just" alts with varying copy number of rDNa, alternative haplotyps or repeats. Therefore they are mappable on unique chr. regions, but while being in those contigs no tool will use them in its workflow (because of hg38 as reference)...

So i'm looking for tool to convert those Alternative loci and repeats like chr13p into chr. Regions of hg38 (because chr13p is hard masked in hg38)

ADD REPLY
0
Entering edit mode

So you have features mapped to alt loci and you are trying to remap them to main chromosomes? Alt loci are present in all GRCh38 assemblies from GRCh38 to GRCh38.p13 so if you are remapping from one point version of GRCh38 to another, the features will not be "remapped".

However, you can use NCBI Remap service to remap features on alt loci; just use the "Alt loci remap" tab. There's an API and command line tool as well if you need to do this as part of a larger script or automate it.

ADD REPLY
1
Entering edit mode
2.6 years ago
BGSpiegl ▴ 10

I might be late to the party (1 year) but might still be able to contribute something useful here. Hopefully also somebody else will end up here during their search and get a satisfying answer to their question.

I wonder if OP "scheurenm" was interested in the location of highly homologous regions on the standard genome of the scaffolds (alternate- or alt-loci) that OP is describing instead of actually "mapping" them (which is futile of course as long as an alt-scaffold with identical sequence remains in the reference genome used for mapping). The description of the placement of alt-loci in one or more of their homologous region(s) on the standard genome (their "chromosomal context") is called "alt-scaffold placement". Actually, the alt-scaffolds scheurem described are patches to the primary assembly.

Beware: there is a difference between unlocalized-, unplaced- (="random"), and alt-contigs! (read more about these here after scrolling down to the "Sequence Types" subsection)

You can get homologous regions (i.e. the chromosome context of alt-loci and patches) from the NCBI.NLM.NIH FTP site (e.g. for newest patch GRCh38.p14): GRCh38.p14 all_alt_scaffold_placement.txt

Scaffold KI270715.1 is not included but KN196477.1 is:

#alt_asm_name   prim_asm_name   alt_scaf_name   alt_scaf_acc    parent_type parent_name parent_acc  region_name ori alt_scaf_start  alt_scaf_stop   parent_start    parent_stop alt_start_tail  alt_stop_tail
PATCHES Primary Assembly    HSCHR5_7_CTG1   KN196477.1  CHROMOSOME  5   CM000667.2  REGION193   +   1   139087  21098347    21230960    0   0

=> KN196477.1, which is actually a patch to the primary assembly, has chromosome context (= parent region or locus) chr5:21098347-21230960 (139,087 bp long)

Btw: looking up your scaffold in this file gets you also the information whether or not the sequence of interest has more than one highly homologous region(s) on the standard genome. For some scaffolds you can also see if a gene is affected by a patch or if there are alternate haplotypes of that gene (e.g. ABR gene) by looking at the region_name column. Region names might have other meaning.

Now, to solve the stated problem OP could use one of the following approaches:

  • Approach 1: go back to the primary assembly and search for the alt-scaffolds of interest in the GRCh38 alt-scaffold placement file! Beware that KN196477.1 is not present in the primary release since it is a patch sequence, not an alt-locus! (FYI: the type of scaffold can be deduced from the first column "alt_asm_name" in the all_alt_scaffold_placement.txt file)
  • Approach 2 (not recommended): remap your scaffolds's chromosome context from the GRCh38.p14 all_alt_scaffold_placement.txt file back to the primary assembly. You might lose some regions and get some region splits. You could also end up with multiple target regions.

Hope this helps some lost souls!

ADD COMMENT
0
Entering edit mode
3.6 years ago

A freeze is a freeze. No one maps to anything but the primary assembly, so "mapping to GRCh38.p13" doesn't mean anything other than you may have used a different Ensembl annotation downstream. If you build some kind of an alternative reference using patches then you're essentially on your own island, though this guide may help.

ADD COMMENT

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6