Hi,
I am trying to do risobome profilings, and to that end I have been trying to align reads to the human transcriptome with no success. My aligner is TopHat - The transcriptomic hg19 data fasta file looks like this:
>uc001aaa.3
cttgccgtcagccttttctttgacctcttc
I need a GTF file for the run (at least I assume that I do?) but the GTF file that I have downloaded from UCSC looks like this:
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
Upon running I get an error msg -
2020-02-04 15:15:08] Building Bowtie index from ucsc_hg19.fa
[FAILED]
Looking thorough the posts here I think that the problem is that my GTF does not match the transcriptome. I have tried to figure out if I can get a transcriptomic GTF from UCSC, and I couldn't find any data. Or am I doing this the wrong way and should have used a differently built reference? I have downloaded the genomic data that includes the protein coding genes. The names in this file look like this:
>hg19_knownGene_uc001aaa.3 range=chr1:11874-14409 5'pad=0 3'pad=0 strand=+ repeatMasking=none
cttgccgtcagccttttctttgacctcttctttctgttcatgtgtatttg
The end goal is to only look the transcriptome data, do RPKM and check expression.
Any help would be greatly appreciated,