Hello All, I have a set of annotated genes in gff3 format and corresponding RNA-seq data. What is the recommended approach and are there specific tools and parameters to determine the percentage of genes supported by the RNA-seq data?"
Regards, B
Hello All, I have a set of annotated genes in gff3 format and corresponding RNA-seq data. What is the recommended approach and are there specific tools and parameters to determine the percentage of genes supported by the RNA-seq data?"
Regards, B
To check for support of gene models by RNAseq, there are several proven workflows. I recommend using them all:
You could also do transcriptome assembly and then map the transcripts on the annotation via MAKER that will add an AED score for concordance between evidence (here transcript) and the annotation. AED=0 is perfect concordance, AED of 1 is complete absence of support.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Michael Thanks. I tried it using STAR to map the transcript to the annotation gtf file, and was planning to use featureCounts/gene Counts afterwards to check the read coverage for each genes. For STAR, I used the following command, but it gave me an empty outputs with no transcripts mapped. It is due to the length of the transcripts (>200bp) being used?.
STAR --genomeDir index_directory/ \ --runThreadN 33 \ --runMode alignReads \ --readFilesCommand zcat \ --readFilesIn ${INPUT}/transcriptome.fasta \ --outFileNamePrefix hybrid_star \ --outSAMtype BAM SortedByCoordinate \ --quantMode GeneCounts
Kindly suggest!,
Regards, B
That is not what STAR is for. STAR is for aligning short reads in fastq format to a reference genome. If you have assembled transcripts already (from where?) use e.g. GMAP, BBMap, or maybe exonerate.
Michael Thanks again!, Just curious, if its possible to align transcripts to the genic regions only using the gene annotation gtf/gff file. I couldn't find any option in GMAP, unlike STAR which can use GTF annotation. Kindly suggest!
I am not sure you can restrict the regions of alignment in STAR. The GTF file is to provide information on splice sites and possibly read counting. During read counting, only genic/exonic regions will be counted.