error while building database by snpEff
0
0
Entering edit mode
4.5 years ago
Mainul ▴ 10

I have 48 rice whole-genome sequences and I would like to do Analysis of the variants in the heat tolerance related genes and functional effects of non-synonymous SNPs. I already make assemble with MSUv6.1 (all.con) from http://rice.plantbiology.msu.edu/ and I have filtered vcf file. So now the problem occurs while to annotated vcf by snpEff. I tried with The rice7 gene model database for Oryza sativa (zip file) but its dose does not match with my vcf file.

I use the following command

java -jar snpEff.jar -v rice7 /usr/bin/filtered_snps_final.vcf > /usr/bin/filtered_snps_final.ann.vcf

and the result come

WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645356
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645357
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645358
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645359
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645360
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645361
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645362
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645363
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645364
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645365
WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645366

ERRORS: Some errors were detected
Error type  Number of errors
ERROR_CHROMOSOME_NOT_FOUND  3651011

snpEff annotation.

After that, I also tried to build my own reference snpEff file as collect gff3 and fasat from http://rice.plantbiology.msu.edu/ by following this method (java -jar snpEff.jar build -gff3 -v MSU6v1). The gff file was namely all.gff3 (change to genes.gff) and fasta file was all.con (change to sequences.fa) and kept in MSU6v1 directory and also found FATAL ERROR: Most Exons do not have sequences!

FATAL ERROR: Most Exons do not have sequences!
There might be differences in the chromosome names used in the genes file '/home/songbk/Mainul_bin/Bioinfo/snpEff/./data/MSU6v1/genes.gff'
and the chromosme names used in the 'reference sequence' file.
Please check that chromosome names in both files match.
    Chromosome names missing in 'reference sequence' file:  '1', '10', '11', '12', '2', '3', '4', '5', '6', '7', '8', '9', 'Sy', 'Un', , , , , , , , , , , 
    Chromosome names missing in 'genes' file             :  '10|13110''11|13111''12|13112''1|13101''2|13102''3|13103''4|13104''5|13105''6|13106''7|13107''8|13108''9|13109')
Fatal Error # see screenshot 2

WARNING: Cannot find first exonic position after 27061823 for transcript '13105.m05011'
WARNING: Cannot find first exonic position after 20836530 for transcript '13102.m03765'
WARNING: Cannot find first exonic position after 28133402 for transcript '13103.m13008'
WARNING: Cannot find first exonic position after 21074629 for transcript '13102.m03809'
WARNING: Cannot find last exonic position before 25987811 for transcript '13105.m04749'
WARNING: Cannot find first exonic position after 2434416 for transcript '13106.m00521'
no sequence found #see screenshot 3

Expert person, please help.

snp gene assembly software error • 1.8k views
ADD COMMENT
1
Entering edit mode

If MSUv6.1 =/= Rice7 then you are not comparing the right genome builds.

ADD REPLY
0
Entering edit mode
  1. Please see How to add images to a Biostars post to add your images properly. You'll need to use a password-free image hosting service such as imgbb, not a file sharing/cloud storage service such as google photos, google drive or dropbox.
  2. Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
    code_formatting
ADD REPLY
0
Entering edit mode

It was my first time to upload photos and code together, I will make sure in future to post in a proper way. Thanks for your help. Can you suggest to me which reference genome build should I have to take for MSUv6.1. java -jar snpEff.jar build -gff3 -v MSU6v1 i tried this to build my own genome build but I did not get any genome build.

ADD REPLY
0
Entering edit mode

Can you suggest to me which reference genome build should I have to take for MSUv6.1.

Which genome build/source did you use to do the original alignments? As you have discovered you can't mix and match genome builds.

ADD REPLY
0
Entering edit mode

So, to identify the heat shock gene I aligned weedy rice WGS sequences with MSUv6.1 reference genome original alignment source and finally create filtred SNP vcf file. So until SNP filtered there was no error. So once I did with variant calling further proceed to the SNP annotation by snpEff tools and I found no proper SNP database for MSUv6.1. So far I try to annotated by snpEFF version 4.2 database like rice_rap201304, rice_rap201503, rice5, rice6.1 and rice7. The error comes that mentioned in the first phase of output.

ADD REPLY

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6