Help with genome guided de novo assembly in trinity
0
0
Entering edit mode
5.0 years ago
vrea • 0

Hi all,

I'm trying to run a genome-guided de novo assembly using trinity with fastq files from zebrafish RNA samples but I have never done an RNA-Seq experiment before.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat. I'm not sure how to go about doing this.. I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl. Is this the right file to download? Also, what should I be doing with this file?

Any help is greatly appreciated!!!

Sincerely, Victoria

assembly trinity genome guided zebrafish RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

Why do you want to perform assembly for zebrafish? It has a well-annotated genome, it is unlikely you will get any improvements over the annotated genome.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat.

You will have to read the documentation from the tool you choose. I would consider STAR (very fast, but memory hungry) if you have about 30Gb RAM available, or HISAT2 or GSNAP if you have less memory.

I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl.

I wouldn't use the hard repeat-masked assembly. Better is Danio_rerio.GRCz11.dna.primary_assembly.fa.

ADD REPLY
0
Entering edit mode

I have RNA samples from different treatment groups and I would like to assemble the transcriptomes in trinity and ultimately compare them. I was under the impression that it is better to do this along with a reference genome if the genome is well annotated (which I know zebrafish is) rather than do a de novo assembly. But in trinity it says I have to provide the read alignments of the reference genome in a coordinate-sorted BAM file in order to do so. Is this not the case? Thanks so much for your help!

ADD REPLY
0
Entering edit mode

Why do you want to do a denovo assembly of you RNA-seq reads which come from zebrafish which already has a well annotated genome as h.mon pointed out. Denovo transcriptome assembly is done for organisms whose genome information is not available.

ADD REPLY
0
Entering edit mode

You just need to map the reads to the genome and quantify counts for each gene - STAR can perform both steps, or you could use HISAT2 (or GSNAP, or Subread) + featureCounts.

Alternatively, you can use Salmon to quantify counts "mapping" to the transcriptome - this would be very fast.

ADD REPLY

Login before adding your answer.

Traffic: 1488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6