SNP annotation using gff3
3
0
Entering edit mode
8.0 years ago
reza ▴ 300

hi everyone

i am trying to annotate of identified SNPs and INDELs (vcf format) using software such as snpEff. i downloaded gff3 file correspond to my reference genome. i have to build database in snpEff because my desired database is not present in snpEff.

problem: scaffold names in gff3 file and VCF file are different and i must to convert scaffold names in gff3 file to scaffolds name in VCF file and then try to building new database in snpEff using name-converted gff3 file. anybody here know how i can do it?

example of differences in scoffold names

name in gff3 * name in VCF

NW_011590949.1 * KN271049.1

NW_011590950.1 * KN271050.1

...

are there alternative ways to annotate new SNPs? it is possible to merge gff and VCF to annotate VCF file?

thanks in advance

SNP sequencing next-gen • 4.9k views
ADD COMMENT
0
Entering edit mode

It's weird that you have different scaffolds names taking in to account that it is the same genome. Are you sure that the genome you have used for variant calling is the same genome that it is annotated in the gff3 file? same genome and same version? If the answer is yes, if you have list with gff3 - vcf scaffold name correlation, let's say NW_011590949.1 gff3 scaffold correspond to KN271049.1 vcf scaffold, it shouldn't be so difficult to change the names using a little script or awk command.

ADD REPLY
0
Entering edit mode
8.0 years ago
reza ▴ 300

it is same genome. anybody can help me to do this with awk command?

ADD COMMENT
0
Entering edit mode

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLY
0
Entering edit mode
8.0 years ago

Hi

This is an example of what I used. It worked for me.

awk '{gsub(/Ld01_v01s1/, "1"); gsub(/Ld02_v01s1/, "2"); gsub(/Ld03_v01s1/, "3"); gsub(/Ld04_v01s1/, "4"); gsub(/Ld05_v01s1/, "5"); gsub(/Ld06_v01s1/, "6"); gsub(/Ld07_v01s1/, "7"); gsub(/Ld08_v01s1/, "8"); gsub(/Ld09_v01s1/, "9"); gsub(/Ld10_v01s1/, "10"); gsub(/Ld11_v01s1/, "11"); gsub(/Ld12_v01s1/, "12"); gsub(/Ld13_v01s1/, "13"); gsub(/Ld14_v01s1/, "14"); gsub(/Ld15_v01s1/, "15"); gsub(/Ld16_v01s1/, "16"); gsub(/Ld17_v01s1/, "17"); gsub(/Ld18_v01s1/, "18"); gsub(/Ld19_v01s1/, "19"); gsub(/Ld20_v01s1/, "20"); gsub(/Ld21_v01s1/, "21"); gsub(/Ld22_v01s1/, "22"); gsub(/Ld23_v01s1/, "23"); gsub(/Ld24_v01s1/, "24"); gsub(/Ld25_v01s1/, "25"); gsub(/Ld26_v01s1/, "26"); gsub(/Ld27_v01s1/, "27"); gsub(/Ld28_v01s1/, "28"); gsub(/Ld29_v01s1/, "29"); gsub(/Ld30_v01s1/, "30"); gsub(/Ld31_v01s1/, "31"); gsub(/Ld32_v01s1/, "32"); gsub(/Ld33_v01s1/, "33"); gsub(/Ld34_v01s1/, "34"); gsub(/Ld35_v01s1/, "35"); gsub(/Ld36_v01s1/, "36"); print;}' ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq.gff3 > ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq-new.gff3

Rangi

ADD COMMENT
0
Entering edit mode

Instead of using this method, it may be more efficient to 1. sort your files according to the chromosome names and then 2. use the unix join command, which performs an SQL style join on two files and 3. cut the columns from your resulting joined file using the unix cut command to get the final file that you want to have.

ADD REPLY
0
Entering edit mode

thank you for your suggestion but my vcf file have ~32000 scaffold id. no shortcut to do this?

ADD REPLY
0
Entering edit mode
8.0 years ago

Instead of modifying the GFF, it's probably easier to add the new scaffold names to your VCF using VCFtools or BCFtools 'annotate' and then edit the VCF to replace the scaffold names.

EDIT: Even easier, assuming you have a file with OLD_NAME and NEW_NAME fields, is the Linux 'join' command:

join -o 1.2,2.2,2.3,2.4,2.5,2.6,2.7,2.8 OLD_NEW_NAMES.FILE OLD.VCF > NEW.VCF

Note that, for any old scaffold name without a corresponding new name, those variants will be lost. Also, you will need to copy the OLD.VCF header to the new file.

ADD COMMENT
0
Entering edit mode

should OLD_NEW_NAMES.FILE have two column (oldname newname)? is it possible for you to clarify it for me?

thanks a lot in advance

ADD REPLY

Login before adding your answer.

Traffic: 3528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6