Question

Improving existing genome with new RNA-seq data

1

Entering edit mode

2.1 years ago

Dunois ★ 2.8k

I am working with a non-model species for which another group has published a genome assembly. The assembly is still at the level of scaffolds (N50 = 3244), and no genes have been annotated. For their study, this group used around 100,000 de novo assembled transcripts (from paired end RNA-seq) to assist in the scaffolding.

I'm working with a significantly larger data set of RNA-seq reads (also paired end, 150 bp Illumina reads) that I've assembled into a de novo transcriptome. The transcriptome itself is, needless to say, also larger and more complete (according to BUSCO) than the one the genome study assembled.

I am wondering if I can now use this data to improve the existing genome assembly. Or would this be a fool's errand? If that is possible, could someone perhaps suggest a good pipeline or set of tools for this?

annotatation genome RNA-seq assembly • 1.5k views

ADD COMMENT • link 2.1 years ago by Dunois ★ 2.8k

1

Entering edit mode

AFAIK, transcripts are not usually used for genome assembly. You can use them for gene annotation though. In any case, an assembly with N50 of 3k may be very challenging to work with, so you may consider improving it using more common ways: more sequencing data, long reads, Hi-C/optical/genetic maps etc.

ADD REPLY • link 2.1 years ago by liorglic ★ 1.4k

0

Entering edit mode

Thank you for the feedback. Do you have any recommendations for tools I could use to annotate the genome as you suggested?

ADD REPLY • link 2.1 years ago by Dunois ★ 2.8k

1

Entering edit mode

PASA can be a good tool to start with.

ADD REPLY • link 2.1 years ago by liorglic ★ 1.4k

0

Entering edit mode

Thank you, I'll take a look at PASA as you suggested.

ADD REPLY • link 2.1 years ago by Dunois ★ 2.8k

score 0 · Answer 1 · 2022-11-01

0

Entering edit mode

2.1 years ago

BioinformaticBird ▴ 110

Did you try it with MOSGA? It can handle FASTQ RNA-seq data with BRAKER to predict protein-coding genes. Additionally, you can provide an existing genome annotation as GBFF file and it will merge both results.

ADD COMMENT • link 2.1 years ago by BioinformaticBird ▴ 110

0

Entering edit mode

Thanks for the suggestion, I did not know this webserver existed!! I don't see any option to feed BRAKER RNA-seq data through MOSGA though? Or do you mean that I could annotate the RNA-seq data itself?

ADD REPLY • link 2.1 years ago by Dunois ★ 2.8k

0

Entering edit mode

Select "BRAKER," click on "More" and select "RNA evidence-based". You should there define in the "Settings" if you are providing a hint file or FASTQ file.

You can alternatively host MOSGA on your computer via docker or a Linux (virtual) machine.

ADD REPLY • link 2.1 years ago by BioinformaticBird ▴ 110

0

Entering edit mode

Ah, I see. I did not even notice that little More button there. Thank you so much. Yes, I might just host this locally, and give it a spin.

No singularity containers by the way? The HPC I have access to does not allow docker containers.

ADD REPLY • link 2.1 years ago by Dunois ★ 2.8k