Question

assembly of mouse transcriptome from BAM file

1

Entering edit mode

6.8 years ago

regulatinggeneexpression ▴ 10

hello, We work with a mouse strain different from the strain that was used in generating the standard mouse reference genome (mm9 or mm10). The Sanger institute has done NGS on the strain we work with and has a BAM file available on its website.

How can I use that BAM file to assemble a transcriptome that I can use as a reference for analyzing RNA-Seq data from this particular strain? I only care about protein-coding orf so I do not need to do de novo genome assembly.

Thanks.

RNA-Seq • 3.1k views

ADD COMMENT • link 6.7 years ago by regulatinggeneexpression ▴ 10

1

Entering edit mode

What kind of BAM file is it? If it is for WGS then you can't use it directly to assemble a transcriptome. If it is from RNAseq data then you could use one of the options mentioned below by @grant after extracting the reads from that BAM file.

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

hello mmfansier, Thanks for suggesting mmseq. I found a genome for my mouse strain. I tried opening the genome in IGV. I had to gunzip, and make a .genome file as described in IGV website (https://software.broadinstitute.org/software/igv/LoadGenome). IGV does not read the fasta genome file. Do you have any advice? Thanks.

ADD REPLY • link 6.7 years ago by regulatinggeneexpression ▴ 10

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under @mmfansler's answer.

IGV does not read the fasta genome file.

Are you getting an error? Are there just fasta sequences of transcripts in the file?

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Just to test, I downloaded a transcriptome from there, unzipped, then indexed (igvtools index transcriptome.fa). This loaded fine in IGV. As mentioned by @genomax, we'll need more details to help further.

ADD REPLY • link 6.7 years ago by mmfansler ▴ 460

score 0 · Answer 1 · 2018-02-25

0

Entering edit mode

6.8 years ago

grant.hovhannisyan ★ 2.6k

Depends on what kind of analysis you want to do further with your data. But in general case you have two main options:

De novo transcriptome assembly, using for example Trinity software.
Reference guided transcriptome assembly, using for example StringTie

In general, reference based assembly is more accurate than de novo assembly, and since most probably genomes of mice strains are not very different (this is my guess), I think reference based assembly is more suitable.

ADD COMMENT • link 6.8 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

Should note, Trinity also has a reference-guided mode. I agree re: reference-guided assemblies, with one caveat: this depends on the quality of the reference genome assembly; I wouldn't recommend this with a poor quality genome assembly, where genes may be fragmented. With mouse this isn't an issue.

ADD REPLY • link 6.7 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

Thank you, I moved my question to a more relevant threat although not recent one

ADD REPLY • link 6.7 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Did you post this in the wrong thread by mistake?

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Sorry, no, I noticed about transcriptome and trinity as my read counts come from RNA-seq and RSEM method I thought to ask here. Sorry if was irrelevant. I seek recent threats but nothing about transcriptome, thus I thought not to create a new post for this question

ADD REPLY • link 6.7 years ago by zizigolu ★ 4.3k

score 0 · Answer 2 · 2018-02-25

0

Entering edit mode

6.8 years ago

mmfansler ▴ 460

mmseq is a project that has transcriptomes assembled for the Sanger mouse strains as well as tools to do your own assembly if needed (e.g.,mouse_strain_transcriptome.sh).

ADD COMMENT • link 6.8 years ago by mmfansler ▴ 460