How to check RNAseq support for annotated genes?
2
0
Entering edit mode
14 months ago
BioinfoBee • 0

Hello All, I have a set of annotated genes in gff3 format and corresponding RNA-seq data. What is the recommended approach and are there specific tools and parameters to determine the percentage of genes supported by the RNA-seq data?"

Regards, B

gene RNA-seq annotation • 1.3k views
ADD COMMENT
1
Entering edit mode
14 months ago
Michael 55k

To check for support of gene models by RNAseq, there are several proven workflows. I recommend using them all:

  • Standard RNAseq analysis based on alignments: Reference-based alignment by Hisat or STAR -> featureCounts/HTseq/RSEM -> check the coverage of the gene models of interest. There is no universally accepted cut-off for the number of reads or coverage per gene.
  • If you are interested in a computationally fast approach that also quantifies transcript isoforms (relevant if more than a single isoform is annotated in your GFF) use either Salmon or Kallisto
  • Use reference-guided transcriptome assembly (e.g. in Trinity) and Blast the transcripts. Check for overlaps of transcripts with gene-models
  • Visually inspect genes of interest for anomalies like shadowing, overlapping genes, duplicated sequences, inhomogeneous coverage, etc.
ADD COMMENT
0
Entering edit mode

Michael Thanks. I tried it using STAR to map the transcript to the annotation gtf file, and was planning to use featureCounts/gene Counts afterwards to check the read coverage for each genes. For STAR, I used the following command, but it gave me an empty outputs with no transcripts mapped. It is due to the length of the transcripts (>200bp) being used?.

STAR --genomeDir index_directory/ \ --runThreadN 33 \ --runMode alignReads \ --readFilesCommand zcat \ --readFilesIn ${INPUT}/transcriptome.fasta \ --outFileNamePrefix hybrid_star \ --outSAMtype BAM SortedByCoordinate \ --quantMode GeneCounts

Kindly suggest!,

Regards, B

ADD REPLY
1
Entering edit mode

I tried it using STAR to map the transcript to the annotation gtf file

That is not what STAR is for. STAR is for aligning short reads in fastq format to a reference genome. If you have assembled transcripts already (from where?) use e.g. GMAP, BBMap, or maybe exonerate.

ADD REPLY
0
Entering edit mode

Michael Thanks again!, Just curious, if its possible to align transcripts to the genic regions only using the gene annotation gtf/gff file. I couldn't find any option in GMAP, unlike STAR which can use GTF annotation. Kindly suggest!

ADD REPLY
0
Entering edit mode

I am not sure you can restrict the regions of alignment in STAR. The GTF file is to provide information on splice sites and possibly read counting. During read counting, only genic/exonic regions will be counted.

ADD REPLY
1
Entering edit mode
14 months ago
Juke34 8.9k

You could also do transcriptome assembly and then map the transcripts on the annotation via MAKER that will add an AED score for concordance between evidence (here transcript) and the annotation. AED=0 is perfect concordance, AED of 1 is complete absence of support.

ADD COMMENT
0
Entering edit mode

Juke34 Thanks for the suggestion. I wonder if bedtool coverage or intersect can be used to check the transcript support or coverage for each gene?

ADD REPLY

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6