Improving existing genome with new RNA-seq data
1
1
Entering edit mode
2.1 years ago
Dunois ★ 2.8k

I am working with a non-model species for which another group has published a genome assembly. The assembly is still at the level of scaffolds (N50 = 3244), and no genes have been annotated. For their study, this group used around 100,000 de novo assembled transcripts (from paired end RNA-seq) to assist in the scaffolding.

I'm working with a significantly larger data set of RNA-seq reads (also paired end, 150 bp Illumina reads) that I've assembled into a de novo transcriptome. The transcriptome itself is, needless to say, also larger and more complete (according to BUSCO) than the one the genome study assembled.

I am wondering if I can now use this data to improve the existing genome assembly. Or would this be a fool's errand? If that is possible, could someone perhaps suggest a good pipeline or set of tools for this?

annotatation genome RNA-seq assembly • 1.5k views
ADD COMMENT
1
Entering edit mode

AFAIK, transcripts are not usually used for genome assembly. You can use them for gene annotation though. In any case, an assembly with N50 of 3k may be very challenging to work with, so you may consider improving it using more common ways: more sequencing data, long reads, Hi-C/optical/genetic maps etc.

ADD REPLY
0
Entering edit mode

Thank you for the feedback. Do you have any recommendations for tools I could use to annotate the genome as you suggested?

ADD REPLY
1
Entering edit mode

PASA can be a good tool to start with.

ADD REPLY
0
Entering edit mode

Thank you, I'll take a look at PASA as you suggested.

ADD REPLY
0
Entering edit mode
2.1 years ago

Did you try it with MOSGA? It can handle FASTQ RNA-seq data with BRAKER to predict protein-coding genes. Additionally, you can provide an existing genome annotation as GBFF file and it will merge both results.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion, I did not know this webserver existed!! I don't see any option to feed BRAKER RNA-seq data through MOSGA though? Or do you mean that I could annotate the RNA-seq data itself?

ADD REPLY
0
Entering edit mode

Select "BRAKER," click on "More" and select "RNA evidence-based". You should there define in the "Settings" if you are providing a hint file or FASTQ file.

You can alternatively host MOSGA on your computer via docker or a Linux (virtual) machine.

ADD REPLY
0
Entering edit mode

Ah, I see. I did not even notice that little More button there. Thank you so much. Yes, I might just host this locally, and give it a spin.

No singularity containers by the way? The HPC I have access to does not allow docker containers.

ADD REPLY

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6