Entering edit mode
5.0 years ago
Picasa
▴
650
Hi,
I am studying allele specific expression and I am stuck on the step to get 2 transcripts (one from haplotype 1 and one from haplotype 2).
To get genes/transcripts from a GTF I use cufflink gffreads
: this is not a problem.
Now I want to make version 2 (ie. haplotype 2, or alternative) of these transcripts (from a .vcf I have): if these are SNPs, there are no issues but if I got indels then the coordinates of the GTF are not relevant anymore.
Someone have an idea how can I create the haplotype 2 transcripts?
Thanks a lot.
What's your end point objective? If I understand correctly, you have a VCF file with variants (SNP/INDELs) and you'd like to generate transcripts that reflect both haplotypes that you're studying. For haplotype 1, it's the reference? and haplotype 2 it's the variant?
Or are you after something else?
Yes exactly, sorry for not being clear.
It's easy to integrate the alternative SNPs (haplotype 2) because it doesn't shift the coordinates of my transcripts (in the GTF), but for indels... I don't have any solutions.
I'm slightly confused. You have the reads already, or how was the vcf file generated? Or are you after integrating the indels into a reference fasta file?
So basically I had wgs data that I used to call variants (SNPs and indels) -> at the end .vcf file.
I have a reference genome fasta file and want to create an alternative genome fasta file (with SNPs and indels) -> This is easy.
But the difficulty is that I want to extract transcripts (so I have annotation GTF file) of this alternative one (what I call haplotype 2) with SNPs (easy) AND indels (difficult as the coordinates of my exons in the GTF files are not relevant anymore).