Question

Simulating mRNA-seq data using VG - theoretical question

0

Entering edit mode

12 months ago

AshleeThomson ▴ 130

Hi everyone,

I'm attempting to simulate mRNA-seq data to map back to my spliced genome graph for a baseline comparison (graph vs linear reference). To summarise, I'm using an actual mRNA-seq data set to match error profiling, RSEM to calculate expression, before using vg sim to simulate the data. With all the in-between steps and files, it's getting to be a pain and I'm running into a lot of issues. But I had an idea and wanted to put it out there for some feedback.

Currently, my graph represents the full genome (introns, exons, etc.) to which I have added splice junctions using vg rna.

IN THEORY, if I were to run the vg rna step again but remove any non-gene regions (-d, --remove-non-gene), which results in an exon-only graph, and ran vg sim using this version of the graph, would this produce mRNA-like data?

I may be completely off the mark but there's no harm in asking.

mrna-seq vg sim • 568 views

ADD COMMENT • link updated 11 months ago by Jordan M Eizenga ▴ 740 • written 12 months ago by AshleeThomson ▴ 130

1

Entering edit mode

Yes, I believe this should be equivalent to using the full graph. It should probably be mentioned that there are still some features of real RNA-seq data that aren't modeled by vg sim, like intron retention in nascent mRNA, stochastic transcription of non-genic sequences, and expression of many ncRNAs. However, these limitations apply to both the full and exon-only graphs.

ADD REPLY • link 11 months ago by Jordan M Eizenga ▴ 740