So I have a vcf file with contigs of 1,2,3,4, etc. I would like to convert them to chr1,chr2,chr3,chr4,etc for some downstream analysis I have. Is there a recommended way of doing this? Thanks!
So I have a vcf file with contigs of 1,2,3,4, etc. I would like to convert them to chr1,chr2,chr3,chr4,etc for some downstream analysis I have. Is there a recommended way of doing this? Thanks!
Please take a look at this post: Changing Chromosome Notation On Vcf
A simple Regex should suffice for the variant lines, but you may wish to address the header with more caution - you'll need to change the location of the reference and the details on the contig lines.
EDIT: Or, use this to overcome the problem where ref with contigs 1,2,3... might differ from the ref with contigs chr1,chr2,chr3... This might prove safer in the longer run.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
That change should only be changing what the values are to from "n" to "chrn" in the header no? We have about 30 contigs in the header so can easily be manually done (its just the dbsnp file we need to edit)
The sed command can be tailored to change all non-header lines, but matching it to lines that do not begin with a
#
. And yes, manual editing of the contig and ref lines is a good idea.