Entering edit mode
4.7 years ago
mrmrwinter
▴
30
Hi
I have just ran Maker to annotate my genome.
where/how can i find how many CDS's it detected?
I have tried running into JBrowse with maker2jbrowse, but i cannot open the output in browser.
Thanks
Thanks Juke, this is exactly what i needed
Hi, ive ran what you siggested but do not understand the output. Which of these statistics is CDS's? Thanks
Number of matchs 259209
Number of protein_matchs 136451
Number of introns 928478
Number of match_parts 1324138
Number of single exon match 259209
Number of single exon protein_match 136451
mean introns per match 3.6
mean introns per protein_match 6.8
mean match_parts per match 5.1
mean match_parts per protein_match 9.7
Total match length 207225956
Total protein_match length 235781736
Total intron length 222493540
Total match_part length 221019217
mean match length 799
mean protein_match length 1727
mean intron length 239
mean match_part length 166
Longest matchs 30330
Longest protein_matchs 38788
Longest introns 19827
Longest match_parts 5271
Shortest matchs 1
Shortest protein_matchs 48
hortest introns -4598
Shortest match_parts 1
Apparently we have isoforms : Number of level1 features: 395660 / Number of level2 features: 1324138 We will proceed to the statistics analysis using only the mRNA with the longest cds Number of matchs 91910
Number of protein_matchs 107719
Number of match_parts 199629
Number of single exon match 91910
Number of single exon protein_match 107719
mean match_parts per match 2.2
mean match_parts per protein_match 1.9
Total match length 185553786
Total protein_match length 227698299
Total match_part length 34729218
mean match length 2018
mean protein_match length 2113
mean match_part length 173
Longest matchs 30330
Longest protein_matchs 38788
Longest match_parts 4467
Shortest matchs 42
Shortest protein_matchs 81
Shortest match_parts 9
You should get a folder called
maker_output_processed_genomeName_evidence/
and inside a file Calledmaker_annotation_stat.txt
. Is is that file you are looking at? Did you run anything else thengaas_maker_merge_outputs_from_datastore.pl
?Yes i got this file and folder
the wall of text above is the contents of
maker-annotation_stats.txt
No, i didnt do anything thing else other than
gaas_maker_merge_outputs_from_datastore.pl
Interesting, I tried the script only on MAKER3 output , maybe you used another version of MAKER. I haven't try on OSX neither, did you run it on OSX?
Anyway... All data generated by maker are gathered in the file
maker_mix.gff
bygaas_maker_merge_outputs_from_datastore.pl
.Similarly you can generate this file using
gff3_merge
from MAKER :gff3_merge -s -d genome.maker.output/genome_master_datastore_index.log > maker_mix.gff
If you want to know the number of CDS you can just do
awk '{if ($3 == "CDS") a++}END{print a}' maker_mix.gff
If you want detailed statistics you must first filter maker_mix.gff file to keep only the gene models ( remove repeats, protein alignment, etc. ):
awk '{if ($2 == "maker") print $0 }' maker_mix.gff > maker_annotation.gff
and then run
agat_sp_statistics.pl
Im running MAKER 2.31. I wasnt aware there was a MAKER3. I am installing it through bioconda though, which may be causing issues.
And no, im running it on a Centos HPC.
awk '{if ($3 == "CDS") a++}END{print a}' maker_mix.gff
didnt work, and a quick grep count for "CDS" came back with 0, but the file looks like a standard gff. Some entries in column 3 are "expressed_sequence_match" or "protein match", and using the awk on those gives back a result.awk '{if ($2 == "maker") print $0 }' maker_mix.gff > maker_annotation.gff
andagat_sp_statistics.pl
returned an empty file.grep -c
gave no instances ofmaker
inmaker_mix.gff
either.Looks like we're getting differently formatted outputs. Ill see about installing manually rather than conda.
It sounds you don’t have any annotation (gene models) in this file. Could you try to generate it with gff3_merge too? If still no CDS in it it means you didn’t predict anything. To perform an evidence-based prediction you must activate est2genome or/and protein2genome parameter. For an abinitio one you must provide an Hmm profile to Augustus/snap/... parameter file and deactivate est2genome and protein2genome.
This is exactly what i've done....
I was so sure i'd set them i didn't think to check. Rerunning now.
Thanks for all the help
I have checked on output of MAKER2 (MAKER 2.31.8) it works perfectly fine for me. I don't understand... In the
maker_mix.gff
file generated bygaas_maker_merge_outputs_from_datastore.pl
do you see any feature withmaker
as source (2nd line of a gff):awk '{if($2 ~ /[a-zA-Z]+/ && $2=="maker") print $0}' maker_mix.gff
.While I never seen that it sounds you have lines like that:
At least from MAKER 2.31.8, lines with
maker
as source (2nd column) are supposed to be gene models and have gene,mRNA,exon,CDS... as feature type (3rd column). Maybe you use an older version (<2.31.8)I didnt set est2genome and protein2genome on (see my other comment)
grep
finds no instances ofmaker
in themaker_mix.gff
file