Issue with creating table counts using StringTie *without* using a reference annotation file (GTF)
1
1
Entering edit mode
6.6 years ago
catglen012 ▴ 10

Hello,

I have been following the procedure given in the paper "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown" with my own examples of course.

1) I have sorted the SAM files to BAM just fine with no errors.

2) Then I assembled transcripts for each of my samples following the example:

stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam

Below is what I used for my samples:

 stringtie GG1_hisat.sorted.bam -o GG1_hisat.sorted.gtf -m 300 -p5

I did not use the -G option because I don't have a reference annotation file yet.

3) Then I merged the transcripts of all my samples:

 stringtie --merge -o stringtie_merged.gtf gtf_files.txt

4) Now, I want to create a table of counts so that I can move on to use Deseq2 but I am unable to do so, and I am not sure why... maybe because I have not provided the -G option?

1st try:

stringtie –B -p8 -G stringtie_merged.gtf -o ballgown_Ceratodon.gtf

Error:

input file –B cannot be found!

2nd try:

stringtie -p8 -G stringtie_merged.gtf -o ballgown_Ceratodon.gtf

Error:

no input file provided!

rna-seq Assembly assembly • 5.2k views
ADD COMMENT
0
2
Entering edit mode
6.6 years ago

If your goal was to create a new, merged, transcriptome reference and to perform differential expression analysis on the transcripts identified in this, then use stringtie_merged.gtf and restart your analysis with BallGown and StringTie, where you will specify this file with the -G option. When using StringTie, ensure that you use the -e option. To create a counts matrix from StringTie that is suitable for DESeq2, you should then use the prepDE.py function: Using StringTie with DESeq2 and edgeR.

If you need a FASTA file relating to your merge GTF, you can produce that with the gffread program that come bundled with StringTie. Yo'ull also need a reference genome in FASTA.

Kevin

ADD COMMENT
0
Entering edit mode

Hello Kevin! I tried to use the command prepDE.py and it seems to not exist. I made sure to use python 2.7 but it just does not work.$ python prepDE.py python: can't open file 'prepDE.py': [Errno 2] No such file or directory Is there a way to go about this?

ADD REPLY
1
Entering edit mode

Hello again. Can you try the following BASH command (execute it from the StringTie root directory):

find . -name "prepDE.py"

Does that find it?

ADD REPLY
0
Entering edit mode

yes! that solved the problem. Thank you Kevin.

ADD REPLY
0
Entering edit mode

Nevermind, I tried finding more info about the command and the same thing appeared again. $ module load python $ module load stringtie $ find . -name "prepDE.py" $ python prepDE.py -h python: can't open file 'prepDE.py': [Errno 2] No such file or directory

ADD REPLY
0
Entering edit mode

But, what is the output of this command?

find . -name "prepDE.py"
ADD REPLY
0
Entering edit mode

It just gives me a fresh line like a "$ " usually when I encounter errors I don't get a new line that starts with $ I get "python can't find open file"

ADD REPLY
1
Entering edit mode

Oh, I see what is happening. You are using a cluster environment and loading StringTie and Python via module commands. However, you have to run the prepDE.py script by explicitly referencing the file with Python. It will be stored where the system administrator has stored StringTie. If you know where that is, you could look for it there, or you may have to submit a request to IT services about it.

Does that make sense?

So, the eventual future command would have to be something like:

python /shared/apps/stringtie/prepDE.py

It depends on where it was stored by the system administrator though.

ADD REPLY
1
Entering edit mode

Of course, you can possibly just download it to your home directory and run the script from there:

To download:

wget https://ccb.jhu.edu/software/stringtie/dl/prepDE.py
ADD REPLY
1
Entering edit mode

Recent versions of StringTie (e.g, the latest version, 1.3.4d) do not contain the prepDE.py script. I don't know if it is a bug, or if the script has been dropped intentionally. I think it has to be downloaded separately.

ADD REPLY
0
Entering edit mode

Thank you Kevin! I will download it using the script you provided.

ADD REPLY
0
Entering edit mode

Hey kevin! I did the following:

$ wget https://ccb.jhu.edu/software/stringtie/dl/prepDE.py --2018-06-27 10:20:20-- https://ccb.jhu.edu/software/stringtie/dl/prepDE.py Resolving ccb.jhu.edu... 128.220.233.225 Connecting to ccb.jhu.edu|128.220.233.225|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 11095 (11K) [text/plain] Saving to: 'prepDE.py'

prepDE.py 100%[===========================================================>] 10.83K --.-KB/s in 0s

2018-06-27 10:20:20 (26.6 MB/s) - 'prepDE.py' saved [11095/11095]

$ python prepDE.py File "prepDE.py", line 32 print "Error: Text file with sample ID and path invalid (%s)" % (line.strip()) ^ SyntaxError: invalid syntax

I was wondering if you knew why this is happening and how to fix it?

ADD REPLY
1
Entering edit mode

You will likely have to pass some arguments to the python prepDE.py command. Please take a look here: http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq

ADD REPLY
0
Entering edit mode

I was able to obtain the output from prepDE.py and I was able to use this in rStudio:

> countData <- as.matrix(read.csv(file.choose(), row.names = "gene_id"))
> colData <- read.table(text = readLines(file.choose(), warn = FALSE), header = TRUE, sep = "," )

But when I check that all sample IDs in colData are also in CountData and match their orders, I obtain the following:

> all(rownames(colData) %in% colnames(countData))

[1] FALSE

the column names are in MSTRG instead of their respective IDs....

Even when I try to ignore this, I obtain the following:

> dds <- DESeqDataSetFromMatrix(countData = countData,  colData = colData, design = ~ CHOOSE_FEATURE)

Error in DESeqDataSet(se, design = design, ignoreRank) : all variables in design formula must be columns in colData

ADD REPLY
0
Entering edit mode

Could you open a new question for this, perhaps? I think that we have at least solved the StringTie / prepDE.py issue. I also ask because I will now be away for a couple of days, so, I am thinking that it would be better to get others to help too.

PS - also link to this old thread, too, in order to give others some context. I will nevertheless check back in a couple of days.

ADD REPLY
0
Entering edit mode

Hello kelvin! I am involved in doing DE calculations of miRNA present in control and stress condition, I completed the step till hisat2 and also I got output for hisat2 but stringtie is not working if I enter the command for string tie empty .gtf document only created by stringtie i am having shortreads data without reference gtf file command I used,

stringtie exp.sorted.bam -o exp.gtf the output i Received,

**# stringtie exp.sorted.bam -o exp.gtf

StringTie version 2.0.4**

thank you in advance and waiting for your reply!

ADD REPLY
0
Entering edit mode

Hello kelvin! I am involved in doing DE calculations of miRNA present in control and stress condition, I completed the step till hisat2 and also I got output for hisat2 but stringtie is not working if I enter the command for string tie empty .gtf document only created by stringtie i am having shortreads data without reference gtf file command I used,

stringtie exp.sorted.bam -o exp.gtf

the output i Received,

**# stringtie exp.sorted.bam -o exp.gtf

# StringTie version 2.0.4**

thank you in advance and waiting for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 3616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6