Hi everyone!
I'm new to the field and I've just started to work on GWAS data. I have some old data (both ped and VCF) that I need to move from hg18 to hg19 to then do imputation on the Michigan server. I've done the remapping using remap.api.pl and the vcf file I have for each chromosome.
perl remap_api.pl --mode asm-asm --from GCF_000001405.12 --dest GCF_000001405.13 --in_format vcf --annotation ../Cleaned_2.recode.vcf --annot_out remapped_2
However, for some SNPs in I have something like this:
> HSCHRUN_RANDOM_CTG42 27080 rs11090516 C T . . REMAP_ALIGN=SP GT
> HSCHRUN_RANDOM_CTG42 31729 rs1062731 C T . . PR;REMAP_ALIGN=SP GT
>HSCHRUN_RANDOM_CTG42 33699 rs730647 C T . . REMAP_ALIGN=SP GT
(this is for chr21)
I've looked up online and it seems those are unplaced scaffold (ftp://ftp.kobic.kr/Data/refseq/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_assembly_report.txt)
How should I treat them? Do I have to eliminate them, if so, how?
Best, Filippo
Does your input vcf file have chromosome names (as in chr1, chr2, etc) or accessions (NC_000001.11, NC_000002.12, etc)? If it is the former, you may want to try converting your vcf first to use the accessions before remap.
Hi! Thanks for the answer! It is with chromosome name and the vcf was cretead from PLINK/1.9. Is there an easy way to change chromosome to accessions?
Just to understand, are the 'HSCHRUN' artifacts or errors of the program or are correct remapping which just need to be correctly mapped to the chromosome?
I have a python script that can do this. Clone this to your local disk (expects python3) https://github.com/vkkodali/cthreepo and run
cthreepo.py
as follows:You will need the
GCF_000001405.12_NCBI36_assembly_report.txt
file which can be downloaded from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.12_NCBI36/GCF_000001405.12_NCBI36_assembly_report.txtHere's some help text for the script:
Hi @vkkodali! Thanks a lot! I will give a try today and tell you the outcome :)