RNA Sequencing and Vg pan -transcriptome build
0
0
Entering edit mode
20 months ago
kcarey • 0

Hello!

I am new to building a pan-transcriptome utilizing assemblies produced by the Human Pangenome Reference Consortium from year 1 data, 47 samples. I planned to use all of the Fasta files for my reference genomes...

I then planned to add vcf files from TCGA from 422 cancer patients, to account for structural variants of molecular subtypes I would like explore, and lastly, I wanted to include 30 samples (of another dataset) of RNA sequencing data, using VG.

I wanted to understand if this was possible from a memory and functionality standpoint? I am new to using VG and have not found literature that explicitly explores this that I can understand. I was going to use Minigraph instead, however, I did not see any way to include rna sequencing (which is important for my project).

If you have any references/links, please feel free to include so that I can read more into them.

Thanks

fasta pan-transcriptome pan-genome vg • 1.4k views
ADD COMMENT
0
Entering edit mode

i am not sure how about the data input format but 'hisat2' is able to 'align rna-seq to a population reference'. this may be a little different than aligning directly to the human pangenome graph, but may be a point of reference. can also come up with various links by searching the vg repository, one link i found here https://github.com/vgteam/vg/wiki/Transcriptomic-analyses I have not done a lot of work with graph genomes but i'd say that it is likely a challenging endeavor compared with reference based, but could be interesting :)

ADD REPLY
0
Entering edit mode

Thank you! I found this as well and trying to understand it now.

ADD REPLY
0
Entering edit mode

The vg mpmap subcommand has features for mapping RNA-seq data to a graph. We describe in more detail in this publication, including a comparison to some other tools you might consider. cmdcolin is correct that HISAT2 is also capable of aligning to a graph that is constructed from a VCF.

If you plan to use VCF data, minigraph isn't really an appropriate tool. It's designed for building a graph from multiple genome assemblies. For VCF input, HISAT2 and vg both have internal graph construction algorithms. I can't speak much for HISAT2, but the easiest entry point for vg's graph construction (for most people) is the vg autoindex subcommand.

ADD REPLY
0
Entering edit mode

Thank you so much! This is where I am leading back to as well! I don't have much insights into Hisat2. I am going to give it a go this week and will comment when I figure it out,.

ADD REPLY

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6