I have 48 rice whole-genome sequences and I would like to do Analysis of the variants in the heat tolerance related genes and functional effects of non-synonymous SNPs. I already make assemble with MSUv6.1 (all.con) from http://rice.plantbiology.msu.edu/ and I have filtered vcf file. So now the problem occurs while to annotated vcf by snpEff. I tried with The rice7 gene model database for Oryza sativa (zip file) but its dose does not match with my vcf file.
I use the following command
java -jar snpEff.jar -v rice7 /usr/bin/filtered_snps_final.vcf > /usr/bin/filtered_snps_final.ann.vcf
and the result come
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645356
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645357
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645358
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645359
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645360
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645361
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645362
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645363
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645364
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645365
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645366
ERRORS: Some errors were detected
Error type Number of errors
ERROR_CHROMOSOME_NOT_FOUND 3651011
.
After that, I also tried to build my own reference snpEff file as collect gff3 and fasat from http://rice.plantbiology.msu.edu/ by following this method (java -jar snpEff.jar build -gff3 -v MSU6v1
). The gff file was namely all.gff3
(change to genes.gff
) and fasta file was all.con
(change to sequences.fa
) and kept in MSU6v1 directory and also found FATAL ERROR: Most Exons do not have sequences!
FATAL ERROR: Most Exons do not have sequences!
There might be differences in the chromosome names used in the genes file '/home/songbk/Mainul_bin/Bioinfo/snpEff/./data/MSU6v1/genes.gff'
and the chromosme names used in the 'reference sequence' file.
Please check that chromosome names in both files match.
Chromosome names missing in 'reference sequence' file: '1', '10', '11', '12', '2', '3', '4', '5', '6', '7', '8', '9', 'Sy', 'Un', , , , , , , , , , ,
Chromosome names missing in 'genes' file : '10|13110''11|13111''12|13112''1|13101''2|13102''3|13103''4|13104''5|13105''6|13106''7|13107''8|13108''9|13109')
Fatal Error # see screenshot 2
WARNING: Cannot find first exonic position after 27061823 for transcript '13105.m05011'
WARNING: Cannot find first exonic position after 20836530 for transcript '13102.m03765'
WARNING: Cannot find first exonic position after 28133402 for transcript '13103.m13008'
WARNING: Cannot find first exonic position after 21074629 for transcript '13102.m03809'
WARNING: Cannot find last exonic position before 25987811 for transcript '13105.m04749'
WARNING: Cannot find first exonic position after 2434416 for transcript '13106.m00521'
no sequence found #see screenshot 3
Expert person, please help.
If
MSUv6.1
=/=Rice7
then you are not comparing the right genome builds.code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.It was my first time to upload photos and code together, I will make sure in future to post in a proper way. Thanks for your help. Can you suggest to me which reference genome build should I have to take for MSUv6.1.
java -jar snpEff.jar build -gff3 -v MSU6v1
i tried this to build my own genome build but I did not get any genome build.Which genome build/source did you use to do the original alignments? As you have discovered you can't mix and match genome builds.
So, to identify the heat shock gene I aligned weedy rice WGS sequences with MSUv6.1 reference genome original alignment source and finally create filtred SNP vcf file. So until SNP filtered there was no error. So once I did with variant calling further proceed to the SNP annotation by snpEff tools and I found no proper SNP database for MSUv6.1. So far I try to annotated by snpEFF version 4.2 database like rice_rap201304, rice_rap201503, rice5, rice6.1 and rice7. The error comes that mentioned in the first phase of output.