Hi everyone!
I am currently starting to use the BRAKER2 pipeline for gene prediction in the genomes of two Phytophthora species. For those who don't know about the pipeline, BRAKER2 uses RNA-seq alignments as input to train GeneMark and AUGUSTUS ab initio gene predictors.
Since I only have RNA-seq data for one of the genomes, I was wondering if it would be a good idea to use RNA-seq reads from a closely related species for my case. Also, do you have any suggestions I could use when running this pipeline? I don't have much experience in annotation, so any idea would be appreciated.
Thank you very much in advance!
Hey! Thank you very much for your answer.
The organisms belong to the same genus and are believed to be sister species, so I would say your suggestion is just perfect for my case. :D
Do you use to use any of khmer recipes and afterqc on the RNA-Seq reads?
Yes, I do tend to do minimal QC on the reads like removing adapter, trimming low quality bases etc. But other than that, I tend to use all the reads that would have been QC'd.
I haven't followed Khmer recipes, partly because I was able to setup other alternatives faster and mostly since I've been working on PacBio since the past year. But thanks for Khmer, I'll explore it :)
Which alternatives did you try out?
For QC? I had used FastQC+Trimmomatic. I found it sufficiently fast and good enough, given I picking up reads with q30 or more only.
I'm just using the RNAseq data to derive gene-structure hints to be honest, as I had a very good assembly (N50>7Mb, #Contigs/Scaffolds - 1300/900) with core gene set being 92% and above.