Hello!
I am new to building a pan-transcriptome utilizing assemblies produced by the Human Pangenome Reference Consortium from year 1 data, 47 samples. I planned to use all of the Fasta files for my reference genomes...
I then planned to add vcf files from TCGA from 422 cancer patients, to account for structural variants of molecular subtypes I would like explore, and lastly, I wanted to include 30 samples (of another dataset) of RNA sequencing data, using VG.
I wanted to understand if this was possible from a memory and functionality standpoint? I am new to using VG and have not found literature that explicitly explores this that I can understand. I was going to use Minigraph instead, however, I did not see any way to include rna sequencing (which is important for my project).
If you have any references/links, please feel free to include so that I can read more into them.
Thanks
i am not sure how about the data input format but 'hisat2' is able to 'align rna-seq to a population reference'. this may be a little different than aligning directly to the human pangenome graph, but may be a point of reference. can also come up with various links by searching the vg repository, one link i found here https://github.com/vgteam/vg/wiki/Transcriptomic-analyses I have not done a lot of work with graph genomes but i'd say that it is likely a challenging endeavor compared with reference based, but could be interesting :)
Thank you! I found this as well and trying to understand it now.
The
vg mpmap
subcommand has features for mapping RNA-seq data to a graph. We describe in more detail in this publication, including a comparison to some other tools you might consider. cmdcolin is correct thatHISAT2
is also capable of aligning to a graph that is constructed from a VCF.If you plan to use VCF data,
minigraph
isn't really an appropriate tool. It's designed for building a graph from multiple genome assemblies. For VCF input,HISAT2
andvg
both have internal graph construction algorithms. I can't speak much forHISAT2
, but the easiest entry point forvg
's graph construction (for most people) is thevg autoindex
subcommand.Thank you so much! This is where I am leading back to as well! I don't have much insights into Hisat2. I am going to give it a go this week and will comment when I figure it out,.