Entering edit mode
7.7 years ago
lakhujanivijay
5.9k
I am trying to run stringtie for transcript abundance estimation using below command:
for i in *.sorted.bam; do sample_name=`echo $i| awk -F "." '{print $1}'`; stringtie -e -B -p 60 -G stringtie-merged.gtf -o $sample_name.gtf $i;done
Warning encountered:
WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
The assembly was done using below commands:
#stringtie assembly
for i in *sorted.bam; do
sample_name=`echo $i| awk -F "." '{print $1}'`
date && time stringtie -p 60 -G reference.gff3 -o $sample_name.gtf $i -A $sample_name.gene.adundance -C $sample_name.fullycovered
done
#merging assemblies
date && time stringtie --merge -p 60 -G reference.gff3 -o stringtie-merged.gtf merge_list.txt
where merge_list.txt contains the list of the gtf files produces as an output from stringtie asembly.
Have you checked your gff3 file looks valid? Also was this the
reference.gff3
used in your original alignment?Additionally, no problem found with the stringtie-merged.gtf file
Alignment was ran using hisat2 without using the gff file.
Are the entries in the gff file synonymous with the annotation of the fasta files you used to build the HISAT index?
Yes, absolutely! I confirmed the same by fetching the sequence from the reference fasta file from the gf file using
bedtools getfasta
Are your GFF3 transcript lines noted as mRNA or as transcript?
"transcript"
They're actually noted as mRNA:
In the 9th field you have:
I don't think that the transcript names are correct in the ID field. They're like
transcript:VIT_01s0011g00010.t01
but I assume they should be onlyVIT_01s0011g00010.t01
, because the ":" could disrupt the program's analysis. The same can be said for the gene.Basically, what you have in the Transcript_ID field should be in the ID field, because that is what these programs read.
Should I then give it a try by removing the ":" ? Or the whole thing along with the colon sign?
I suggest you edit genes and transcripts in their ID and Parent fields removing the
gene:
or thetranscript:
part, like:Become:
Looking at the code here, the error is seen when:
where guided
where no_ref_used