Hi all,
I am using the stringtie - ballgown pipeline from https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de and I am running into an issue with the outputs.
My resulting ballgown object has the reference genes present with 0 coverage in any sample, and 'new' transcripts with Stringtie IDs with coverage. Crucially, these do not overlap with the reference genes.
I know the reference transcripts are present because BUSCO can identify the expected single copy orthologues from the gtf file. I do not understand why Stringtie is not accepting them, or detecting any overlap.
I have read other stringtie annotation issues but the lack of overlap between the identified transcript and the reference doesn't seem to have come up before. At this point I am lost - is there some step I am missing or have overlooked? I would greatly appreciate any advice.
Code is below:
- for each RNA-Seq sample, map the reads to the genome with HISAT2 using the --dta option (used a reference genome (.fa) index built using HiSat2)
hisat2 -p 10 --no-discordant --no-mixed --dta --rna-strandness RF --mp 4,2 --rdg 5,3 -x hisat2_gen_index -q -1 1_1.Q20.fastq -2 1_2.Q20.fastq -S 1_AN_.sam 2> 1_AN_report.txt
samtools sort -@ 8 -o 1_AN_.sorted.bam 1_AN_.sam
- for each RNA-Seq sample, run StringTie to assemble the read alignments obtained in the previous step (used an annotation file; same chromosome naming convention as the reference genome. I have tried with this file as gff and as gtf, neither worked)
stringtie 1_AN_.sorted.bam -G gtf_anno.gtf -A 1_gtf_stringtie_assembly_abundances.tab -f 0.005 -p 10 -o 1_gtf_stringtie_assembly.gtf
- ran StringTie with --merge in order to generate a non-redundant set of transcripts observed in any of the RNA-Seq samples assembled previously. (used the annotation file again)
stringtie --merge -m 20 -p 10 -f 0.005 -G gtf_anno.gtf -o merged_gtf_stringtie_assembly.gtf gtf_mergelist.txt
4 . for each RNA-Seq sample, run StringTie using the -B/-b options in order to estimate transcript abundances and generate read coverage tables for Ballgown.
stringtie -B -f 0.005 -p 10 1_AN_loomismatch_defgap.sorted.bam -G merged_gtf_stringtie_assembly.gtf -A ballgown_gtf/1_gtf/1_gtf_stringtie_merged_assembly_abundances.tab -o ballgown_gtf/1_gtf/1_gtf_stringtie_merged_assembly.gtf