Entering edit mode
6.5 years ago
k.kathirvel93
▴
310
Hi EveryOne,
I am using STAR and HISAT2 and two more tools RNA-seq data analysis. My question is, should I align my Draft genome with GRCh38 (.gtf) reference annotation or genome (.fa)reference? Which one is best?
Draft genome sequence and transcript level? Can you elaborate more about the situation? If you are talking about alignment in general, the gene transfer format does not include any sequence-related information, so for a draft genome sequence alignment, the ref genome is the right file. The GTF file is used by some tools as a guide to restrict the analysis to known transcripts only (reduces the time required for analysis).
Could you elaborate on the draft genome and the other genome (.fa)reference? Usually there will be only one genome, one gtf and one or more fastq files.
I am doing Transcriptome analysis. Draft genome - sequenced from patient sample. genome(.fa) - GRCh38 genome reference fasta file. Very clearly, STAR is taking .gtf and genome.fa file for reference mapping. But HISAT2 is taking only genome.fa as reference. Which one is correct. Thanks
That is not true. Hisat2 needs a file of known splice junctions, which you have to generate from a reference GTF file. You could run Hisat2 without that file, but then it loses its splice-awareness, which is essential for meaningful alignment of RNA-seq reads that span exon-exon junctions. You can use both tools. My recommendation for you is the following: Look into the usage of both tools and choose the one that you feel more comfortable with. Both tools are well-accepted, tested and produce meaningful results. Much more important than the alignment is the downstream analysis, which you should focus on. What exactly is your final goal?
My aim is differential gene expression analysis. I want to use both of the tools for comparison. Can i get the code for HIsat2 indexing with both genome(.fa) and annotation(.gtf) reference? Thanks.
./hisat2-build
will give you the information on the indexing../hisat2_extract_splice_sites.py
extracts the splice sites from the GTF.