Question

Help with genome guided de novo assembly in trinity

0

Entering edit mode

5.0 years ago

vrea • 0

Hi all,

I'm trying to run a genome-guided de novo assembly using trinity with fastq files from zebrafish RNA samples but I have never done an RNA-Seq experiment before.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat. I'm not sure how to go about doing this.. I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl. Is this the right file to download? Also, what should I be doing with this file?

Any help is greatly appreciated!!!

Sincerely, Victoria

assembly trinity genome guided zebrafish RNA-Seq • 1.6k views

ADD COMMENT • link 5.0 years ago by vrea • 0

0

Entering edit mode

Why do you want to perform assembly for zebrafish? It has a well-annotated genome, it is unlikely you will get any improvements over the annotated genome.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat.

You will have to read the documentation from the tool you choose. I would consider STAR (very fast, but memory hungry) if you have about 30Gb RAM available, or HISAT2 or GSNAP if you have less memory.

I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl.

I wouldn't use the hard repeat-masked assembly. Better is Danio_rerio.GRCz11.dna.primary_assembly.fa.

ADD REPLY • link 5.0 years ago by h.mon 35k

0

Entering edit mode

I have RNA samples from different treatment groups and I would like to assemble the transcriptomes in trinity and ultimately compare them. I was under the impression that it is better to do this along with a reference genome if the genome is well annotated (which I know zebrafish is) rather than do a de novo assembly. But in trinity it says I have to provide the read alignments of the reference genome in a coordinate-sorted BAM file in order to do so. Is this not the case? Thanks so much for your help!

ADD REPLY • link 5.0 years ago by vrea • 0

0

Entering edit mode

Why do you want to do a denovo assembly of you RNA-seq reads which come from zebrafish which already has a well annotated genome as h.mon pointed out. Denovo transcriptome assembly is done for organisms whose genome information is not available.

ADD REPLY • link 5.0 years ago by ashish ▴ 680

0

Entering edit mode

You just need to map the reads to the genome and quantify counts for each gene - STAR can perform both steps, or you could use HISAT2 (or GSNAP, or Subread) + featureCounts.

Alternatively, you can use Salmon to quantify counts "mapping" to the transcriptome - this would be very fast.

ADD REPLY • link 5.0 years ago by h.mon 35k