Using RNA-seq reads from closely related species in BRAKER2 genome annotation pipeline
1
0
Entering edit mode
6.2 years ago
ayala.usma • 0

Hi everyone!

I am currently starting to use the BRAKER2 pipeline for gene prediction in the genomes of two Phytophthora species. For those who don't know about the pipeline, BRAKER2 uses RNA-seq alignments as input to train GeneMark and AUGUSTUS ab initio gene predictors.

Since I only have RNA-seq data for one of the genomes, I was wondering if it would be a good idea to use RNA-seq reads from a closely related species for my case. Also, do you have any suggestions I could use when running this pipeline? I don't have much experience in annotation, so any idea would be appreciated.

Thank you very much in advance!

RNA-Seq genome annotation gene prediction • 3.4k views
ADD COMMENT
2
Entering edit mode
6.2 years ago
harish ▴ 470

It depends. The following questions have to be answered by you.

Are the organisms in the same family or genus? How is the alignment rate? How much can you compromise on the false-positives?

Generally if the species are in the same genus or family, then it is generally fine for you to use those datasets for training in BRAKER. It is going to derive intronic-hints and later be creating an Augustus profile.But be sure that you do get a high unique mapping rate as they would be more important in running BRAKER.

My pipeline is something like this:

  1. Repeat mask the genome
  2. Multiple RNASeq bam files and merge them
  3. Also have proteins from closer species.
  4. Generate BRAKER tuned Augustus and Genemark profiles.
  5. Run gene-prediction using the above Genemark and Augustus on RepeatMasked genome.
ADD COMMENT
0
Entering edit mode

Hey! Thank you very much for your answer.

The organisms belong to the same genus and are believed to be sister species, so I would say your suggestion is just perfect for my case. :D

ADD REPLY
0
Entering edit mode

Do you use to use any of khmer recipes and afterqc on the RNA-Seq reads?

ADD REPLY
0
Entering edit mode

Yes, I do tend to do minimal QC on the reads like removing adapter, trimming low quality bases etc. But other than that, I tend to use all the reads that would have been QC'd.

I haven't followed Khmer recipes, partly because I was able to setup other alternatives faster and mostly since I've been working on PacBio since the past year. But thanks for Khmer, I'll explore it :)

ADD REPLY
0
Entering edit mode

Which alternatives did you try out?

ADD REPLY
0
Entering edit mode

For QC? I had used FastQC+Trimmomatic. I found it sufficiently fast and good enough, given I picking up reads with q30 or more only.

I'm just using the RNAseq data to derive gene-structure hints to be honest, as I had a very good assembly (N50>7Mb, #Contigs/Scaffolds - 1300/900) with core gene set being 92% and above.

ADD REPLY

Login before adding your answer.

Traffic: 1695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6