Hi everyone !
I'm still struggling on maker.
What I want to do, is to follow step by step the maker tutorial for Snap training (the one described here : http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial#Training_ab_initio_Gene_Predictors ).
My input file is an genome assembly I made, with several contigs.
In the maker tutorial, they ask to convert the generated gff into a zff, and some other task. My problem is that, in the tutorial, they are working only on one locus, or one file. But in my results, I have like thousands of directories, each one containing several gff.
For example :
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ ls
00 0B 16 22 2D 38 43 4F 5A 66 71 7C 87 92 9D A9 B4 BF CA D5 E0 EB F6
(...) DF EA F5
And each directory could possibly contain several subdirectories, but the number can vary from one to another :
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ cd 0B/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ ls
28 67 7C 92 D0 F8
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ cd 28
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ ls
tig00000634
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ cd tig00000634/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28/tig00000634$ ls
run.log
theVoid.tig00000634
tig00000634.gff
tig00000634.maker.augustus_masked.proteins.fasta
tig00000634.maker.augustus_masked.transcripts.fasta
tig00000634.maker.non_overlapping_ab_initio.proteins.fasta
tig00000634.maker.non_overlapping_ab_initio.transcripts.fasta
The ones I'm interested in are the simple tig00(number).gff, but I want to train snap for each contig like this, I want to train snap for the whole assembly, and I hope that I don't have to do this for each .gff file, even if a script do it for me... Because the folowing steps for Snap trainign require to launch maker again, with the hmm model produced by SNap on the whole genome.
What I want, is a easy way to convert all theses gff to only one output gff, which correspond to the maker output. I can't find something looking like this in the maker documentation, but I'm sure that maker users know a way to do what I want.
Do you have any advices ? Thanks for your help !
Cheers,
Roxane
If you actual question is to convert multiple GTFs to single GTF file, you can try cuffmerge
1) create 'assembly_GTF_list.txt' file by listing all GTF files from your master directory. It works recursively, lists all .gtf files from all sub directories.
2) CuffMerge
Sorry if I misunderstood your question.
My question was indeed something like this, but it was specific to maker, because the output directories are kind of strange to list, but I think that I can find theses informations in a maker output file, I think that Philipp answered my question below ! But thanks for helping ! It will be probably useful for me.
Glad that you got your answer. Good luck.