I have a fully assembled genome of Arabidopsis thaliana (Ler) which I am trying to produce a gff reference for (or similar format). I'm particularly interested in getting accurate Transcription start sites, but also coding start sites and hopefully intron-exon boundaries.
The major A. thal ecotype (Col) is highly annotated and pretty much a gold standard, and there aren't a huge number of differences in my closely related ecotype, however co-ordinates obviously start slipping and it's not ideal.
I have a few RNAseq datasets which I can use to predict models, but as I have a strong closely related reference already I figure that should be able to help. I have tried delving into Maker, but I'm drowning in options and so far I haven't been able to maintain gene models, only strings of exon matches. I thought a straight up blast with the related species's cds.fasta might work, but then I omit my RNAseq and any of the changes between the species.
Any suggestions would be appreciated!
Thanks, I've given BLAT a go and it looks like a good start which I can use to apply the RNAseq to.