Question

ballgown and pData

0

Entering edit mode

5.5 years ago

Morris_Chair ▴ 370

Hello everyone, I'm analyzing data using the new tuxedo package (HISAT, StringTie, and Ballgown) but I have problem to make ballgown working

pf_rna<-ballgown(dataDir="ballgown/", samplePattern = sample, pData=pheno_data)

I get this error

Sat Jul  6 00:55:39 2019
Sat Jul  6 00:55:39 2019: Reading linking tables
Sat Jul  6 00:55:40 2019: Reading intron data files
Sat Jul  6 00:55:41 2019: Merging intron data
Sat Jul  6 00:55:43 2019: Reading exon data files
Sat Jul  6 00:55:45 2019: Merging exon data
Sat Jul  6 00:55:47 2019: Reading transcript data files
Sat Jul  6 00:55:48 2019: Merging transcript data
successfully rearranged!
Wrapping up the results
Sat Jul  6 00:55:50 2019
Warning message:
In ballgown(dataDir = "ballgown/", samplePattern = sample, pData = pheno_data) :

Rows of pData did not seem to be in the same order as the columns of the expression data. Attempting to rearrange pData...

The name of the pData have the same order of the name of the folders where the files .ctab are located but after doing different attempts to fix it I'm exhausted and I need help ... Do you have any suggestion? I appreciate

thank you

RNA-Seq ballgown • 2.5k views

ADD COMMENT • link updated 22 months ago by Pegasus ▴ 120 • written 5.5 years ago by Morris_Chair ▴ 370

0

Entering edit mode

Ok I will answer by myself :) I was able to run ballgown in a different way (pData caused some problem), in my opinion there must be a bug in this tool preventing the analysis with the script above. After searching, studying and discouraging.. here is the solution:

Read the design_matrix file

pheno_data = read.table(file ="phonotype.txt", header = TRUE, sep = "\t")

Full path to the sample directories

sample_full_path <- paste("ballgown/",pheno_data[,1], sep = '/')

Load ballgown data structure

bg = ballgown(samples=as.vector(sample_full_path),pData=pheno_data)

All the best

ADD REPLY • link 5.5 years ago by Morris_Chair ▴ 370

1

Entering edit mode

You could make your life a lot easier if you simply used salmon-tximport and then any of the common downstream tools such as edgeR or DESeq2. Documentation is outstandingly comprehensive and you do not have to mess around with this odd ballgown tool.

ADD REPLY • link 5.5 years ago by ATpoint 86k

0

Entering edit mode

Hello ATpoint,

I followed your advice in the past and I have my pipeline that works perfectly fine, and yes your are definitely right, it's lot easier to use Salmon and DESeq2.

The reason why I'm doing this is because I want to experience different tools for doing differential expression analysis (I'm aware that ballgown it's not the best), because I want to be able to detect the splicing variant for each genes and lastly, because I want to be familiar with this command lines and pipeline because they might be useful to the next aim, meta analysis.

Thank you :)

ADD REPLY • link 5.5 years ago by Morris_Chair ▴ 370

0

Entering edit mode

Hi Morris, I am using HISAT2 - StringTie -Ballgown pipeline. I was wondering if you could please comment on the following error message? Thanks.

bg_chrX = ballgown(dataDir = "ballgown", samplePattern = "PC", pData=pheno_data)

Sat Jul 11 09:24:49 2020

Sat Jul 11 09:24:49 2020: Reading linking tables

Sat Jul 11 09:24:51 2020: Reading intron data files

Sat Jul 11 09:24:57 2020: Merging intron data

Sat Jul 11 09:24:58 2020: Reading exon data files

Sat Jul 11 09:25:09 2020: Merging exon data

Sat Jul 11 09:25:11 2020: Reading transcript data files

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 686 did not have 12 elements

ADD REPLY • link 4.5 years ago by Asad Prodhan ▴ 10

0

Entering edit mode

Hi Asad, I am sorry but at the end for my analysis I used a different pipeline so I don't know what to say about this error. I hope someone else will be helpful for you

ADD REPLY • link 4.4 years ago by Morris_Chair ▴ 370

0

Entering edit mode

Thanks Morris. Now, I am using DESeq2 instead of Ballgown. My RNA-Seq workflow is HISAT2 - StringTie -DESeq2. It seems to be working fine.

To help others running into the same issue, here I present what I am doing:

I have estimated transcript abundance with the following call.

stringtie –e –B -p 16 -G stringtie_merged.gtf -o ballgown/ERR188044/ERR188044_chrX.gtf ERR188044_chrX.bam

Then, I have run the 'python prepDE.py' script within the above 'ballgown' directory. It has generated gene and transcript count matrix in csv format.

I am analysing these csv files using the DESeq2 package.

ADD REPLY • link 4.4 years ago by Asad Prodhan ▴ 10

0

Entering edit mode

Hi Asad Prodhan,

So can we use DESeq2 directly, without running balgown?

ADD REPLY • link 22 months ago by Pegasus ▴ 120