Hello All,
I know this question has been answered a couple of times, though I am confused about how the indexing should be done.
I have RNA-seq data and two conditions. I am planning to get both DE mRNAs and LncRNAs using HISAT2.
To identify DE LncRNAs from RNA-seq data, I know that I should use the GTF file from the GeneCode website. Below is the order of what I did:
I have two GTF files
- known_lncRNA.gtf (obtained from Genecode)
- gencode.v35.annotation.gtf (obtained from Genecode)
To identify known DE LncRNA, I performed the below steps:
- make an index by
taking first the splice sites from the known_lncRNA.gtf file:
hisat2_extract_splice_sites.py known_lncRNA.gtf > known_lncRNA_splicSite.ss
extracting exons from the whole GTF file:
hisat2_extract_exons.py gencode.v35.annotation.gtf > genome.exon (or should I used the known_lncRNA.gtf here instead of gencode.v35.annotation.gtf)
Then make the index file:
- hisat2-build -p 16 --exon genome.exon --ss known_lncRNA_splicSite.ss genome.fa ./genome_tran
Is this the correct way of making the index for specifically LncRNAs?
I then performed 1. QC reads and remove adapters 2. HISAT2 3. feature counts 4. DESEq or EdgeR
Also, for the featurecounts step, should I used the integrated GTF file: known_lncRNA.gtf+gencode.v35.annotation.gtf or just the "known_lncRNA.gtf"
I really appreciated any hint as I am stuck in this step.
Tutorial
tag is reserved for actual tutorials that show users how to do something. You are asking questions about what you need to do so please don't use that tag.