Question

NCBI Remap Issue

0

Entering edit mode

6.1 years ago

f.abbondanza94 • 0

Hi everyone!

I'm new to the field and I've just started to work on GWAS data. I have some old data (both ped and VCF) that I need to move from hg18 to hg19 to then do imputation on the Michigan server. I've done the remapping using remap.api.pl and the vcf file I have for each chromosome.

perl remap_api.pl --mode asm-asm --from GCF_000001405.12 --dest GCF_000001405.13 --in_format vcf --annotation ../Cleaned_2.recode.vcf --annot_out remapped_2

However, for some SNPs in I have something like this:

> HSCHRUN_RANDOM_CTG42    27080   rs11090516      C       T   .      .    REMAP_ALIGN=SP       GT
> HSCHRUN_RANDOM_CTG42    31729   rs1062731       C       T   .      .    PR;REMAP_ALIGN=SP    GT
 >HSCHRUN_RANDOM_CTG42    33699   rs730647        C       T   .      .    REMAP_ALIGN=SP       GT

(this is for chr21)

I've looked up online and it seems those are unplaced scaffold (ftp://ftp.kobic.kr/Data/refseq/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_assembly_report.txt)

How should I treat them? Do I have to eliminate them, if so, how?

Best, Filippo

remapping vcftools GWAS genome SNP • 2.1k views

ADD COMMENT • link 6.1 years ago by f.abbondanza94 • 0

0

Entering edit mode

Does your input vcf file have chromosome names (as in chr1, chr2, etc) or accessions (NC_000001.11, NC_000002.12, etc)? If it is the former, you may want to try converting your vcf first to use the accessions before remap.

ADD REPLY • link 6.1 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

Hi! Thanks for the answer! It is with chromosome name and the vcf was cretead from PLINK/1.9. Is there an easy way to change chromosome to accessions?

Just to understand, are the 'HSCHRUN' artifacts or errors of the program or are correct remapping which just need to be correctly mapped to the chromosome?

ADD REPLY • link 6.1 years ago by f.abbondanza94 • 0

0

Entering edit mode

I have a python script that can do this. Clone this to your local disk (expects python3) https://github.com/vkkodali/cthreepo and run cthreepo.py as follows:

cthreepo -i <input.vcf> -m GCF_000001405.12_NCBI36_assembly_report.txt -if ucsc -it rs -f vcf -o <output.vcf>

You will need the GCF_000001405.12_NCBI36_assembly_report.txt file which can be downloaded from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.12_NCBI36/GCF_000001405.12_NCBI36_assembly_report.txt

Here's some help text for the script:

$ cthreepo.py -h
usage: cthreepo.py [-h] [-i INFILE] [-o OUTFILE] [-m MAPFILE] [-if ID_FROM]
                   [-it ID_TO] [-ku] [-p] [-f FORMAT]

This script parses input file and converts the seq-id name from one kind to
the other

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        input file
  -o OUTFILE, --outfile OUTFILE
                        output file
  -m MAPFILE, --mapfile MAPFILE
                        NCBI style assembly_report file for mapping
  -if ID_FROM, --id_from ID_FROM
                        seq-id format in the input file; can be `ens`, `uc`,
                        `gb`, or `rs`; default is `uc`
  -it ID_TO, --id_to ID_TO
                        seq-id format in the output file; can be `ens`, `uc`,
                        `gb`, or `rs`; default is `rs`
  -ku, --keep_unmapped  keep lines that don't have seq-id matches
  -p, --primary         restrict to primary assembly only
  -f FORMAT, --format FORMAT
                        input file format; can be `gff3`, `gtf`, `bedgraph`
                        `bed`, `sam`, `vcf` or `wig`; default is `gff3`