Hello,
I am hoping to use maker on a small cluster (~6 compute nodes) to annotate a fairly fragmented de novo assembly that has some longer contigs. We have maker installed, but so far even though every program runs, RepeatMasker seems to be the only program finding matches. Namely, blastx and exonerate don't find any alignment matches even though they seem to be set up correctly in the maker control file.
What I was wondering was whether this is an artifact of the fragmented assembly or some sort of setup error? I find the former hard to believe considering I got at least 2-3 blast hits for each longer contig in the entire assembly using galaxy megablast. I think the error lies in the fact that I get 0 hits, but I am not sure why:
Widget::blastx:
/usr/bin/blastx -db /tmp/maker_sHnU1b/chickenproteomeuniprot%2Efasta.mpi.10.9 -query /tmp/maker_sHnU1b/0/scaffold_1035.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/zgayk/MakerExample2/Gaviaimmerheader.maker.output/Gaviaimmerheader_datastore/38/7C/scaffold_1035//theVoid.scaffold_1035/0/scaffold_1035.0.chickenproteomeuniprot%2Efasta.blastx.temp_dir/chickenproteomeuniprot%2Efasta.mpi.10.9.blastx
#-------------------------------#
deleted:0 hits
collecting blastx reports
flattening protein clusters
prepare section files
processing the chunk divide
preparing evidence clusters for annotations
Preparing evidence for hint based annotation
clustering transcripts into genes for annotations
Processing transcripts into genes
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output
examining contents of the fasta file and run log
Essentially each .gff file produced for each contig is empty. If anyone knew how to fix this, I would be very appreciative.
Zach Gayk
Could you tell us
min_contig
parameter in themaker_opts.ctl
?As specified in the
maker_opts.ctl
, under 10kb try to annotate a sequence is often useless.Hello, the assembly is fragmented:
The assembly is from a bird: the common loon (Gavia immer). I used the chicken (Gallus gallus) proteome as protein data, along with chicken cDNA for EST evidence. I put the minimum contig length at 500. The contig N50 is 814 bp.
Most of the assembly is in small contigs less than 1 kb, and I was only going to use maker as a trial. I thought it might be possible to get valid annotations for the longer contigs at least, but if you think this is not feasible let me know. The assembly was produced using abyss with pe read data and a k-mer size of 32. Then, because it was still so fragmented, I aligned the contigs to the available red-throated loon genome and this is what is shown. I am not sure why the assembly remain this fragmented (we have basically have no scaffolds), although it could be that the group that did the sequencing used one pe library (8kb). If there are any suggestions as to why the assembly remains so fragmented, I would be very interested. Are we too limited by having one insert library?
Thanks,
Zach