Converting dbSNP VCF to work with RefSeq chromossome ID
0
0
Entering edit mode
20 months ago
avelarbio46 ▴ 30

Hello everyone! I've been trying to use GATK with updated version of the human genome as the GATK files are outdated by ten years.

I've downloaded NCBI reference GCF_000001405.40.fna, which is GRCh38.p14

For dbSNP version, I've downloaded GCF_000001405.40.gz , which is also GRCh38.p14

When extracting the contig names from my reference file, I found:

NC_000001.11 Homo sapiens chromosome 1, GRCh38.p14 Primary Assembly
0 252068378 NT_187361.1 Homo sapiens chromosome 1 unlocalized genomic scaffold, GRCh38.p14 Primary As etc...

Extracting the contig names:

reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1, NT_187364.1, NT_187365.1, NT_187366.1, NT_187367.1, NT_187368.1, NT_187369.1, NC_000002.12, NT_187370.1, NT_187371.1, NC_000003.12, NT_167215.1, NC_000004.12, NT_113793.3...

For dbSNP file, I found:

features contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_KI270706v1_random, chr1_KI270707v1_random...

Which causes a bunch of errors with GATK and other anotation tools.

I'm lost to which option would be the best: Converting all BAMs and reference file contig names or converting the dbSNP vcf contig names. I have no idea how to do any of them!

NCBI dbSNP RefSeq • 1.7k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

see bcftools annotate --rename-chrs

ADD REPLY
0
Entering edit mode

When I downloaded this file: https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/

I got the contigs present in the NCBI reference. Although this created another problem for me which is that the dbsnp RSIDs seem to not be mapped to the main chromosomes. For example NW_015148968.1 was coming up for rs28371738 instead of the contigs chr22/NC_0000022....

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6