Question

ballgown has "." as Gene Names

0

Entering edit mode

6.0 years ago

Fawzi Yassine ▴ 20

Hi, I'm using stringtie and ballgown (in R) for the standard RNA-seq data analysis.

Both

texp(ballgown_obj1, 'all')$gene_name

and

ballgown::geneNames(ballgown_obj1)

return all the gene names as “.”

How can I get my gene names?

Thanks,

ballgown RNA-Seq assembly • 2.7k views

ADD COMMENT • link updated 4.8 years ago by Sissi ▴ 60 • written 6.0 years ago by Fawzi Yassine ▴ 20

1

Entering edit mode

Did you do a denovo assembly ? If so, you will have to first add the gene names by using getGenes() and a gtf file. Also check what is the output of indexes(ballgown_obj1)$t2g

ADD REPLY • link 6.0 years ago by aditi.qamra ▴ 270

0

Entering edit mode

Hi Aditi, It is not denovo assembly. he output of indexes(ballgown_obj1)$t2 is

t_id    g_id
4     4 MSTRG.5
7     7 MSTRG.5
9     9 MSTRG.5
10   10 MSTRG.5
16   16 MSTRG.2
17   17 MSTRG.2

ADD REPLY • link updated 5.9 years ago by GenoMax 148k • written 6.0 years ago by Fawzi Yassine ▴ 20

1

Entering edit mode

Can you post your code here?

ADD REPLY • link 5.9 years ago by aditi.qamra ▴ 270

0

Entering edit mode

Aditi, I am sorry for the late reply. I was fixing my PC.. Below is the code for stringtie and ballgown. Here is the code for stringtie: stringtie -e -B -p 8 -G ./stringtie_merged.gtf -o ${BALLGOWNDIR}/SRR${A}/SRR${A}.gtf ${HISAT2DIR}/SRR${A}.bam

Here is the code for ballgown

#Read phenotype sample data
pheno_data = read.csv("data/frda_phenodata.csv", header = TRUE, colClasses = rep("character", 4))
pheno_data = pheno_data[order(pheno_data$ids), ]


# Read in expression data
ballgown_obj = ballgown(dataDir = "data/ballgown", samplePattern = "SRR", pData = pheno_data)

#Pre-Filtering out genes that are expressed at low levels prior to differential expression analysis reduces the severity of the multiple-testing correction and may improve the power of detection.\
#Pre-filtering to keep only rows that have at least 10 reads total (across samples). 
ballgown_obj1 = subset(ballgown_obj, "rowSums(texpr(ballgown_obj)) >= 10", genomesubset=TRUE)

#Filter low-abundance genes. Here we remove all transcripts with a variance across the samples of less than one:
ballgown_obj2 = subset(ballgown_obj, "rowVars(texpr(ballgown_obj)) > 1", genomesubset=TRUE)

#DE by transcript
differ_transcripts1 = stattest(ballgown_obj1, feature = "transcript", covariate = "genotype", getFC = TRUE,  meas = "FPKM")

#DE by gene
differ_genes1 = stattest(ballgown_obj1, feature = "gene", covariate = "genotype", getFC = TRUE, meas = "FPKM")

ADD REPLY • link updated 5.9 years ago by GenoMax 148k • written 5.9 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

I solved the problem, the annotation file did not a gene_name column.

ADD REPLY • link 5.9 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

Hi there,

I have the same "." problem. I read your code and basically used the same, but I can't find the solution. How did you solve at the end?

Thanks

ADD REPLY • link 4.8 years ago by Sissi ▴ 60

score 0 · Accepted Answer · 2019-01-25

0

Entering edit mode

5.9 years ago

Fawzi Yassine ▴ 20

I solved the problem, the annotation file did not a gene_name column.

ADD COMMENT • link 5.9 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

Hi Fawzi,

It’s perfectly OK to answer you own questions, but if you could endeavour to make the answers as thorough as possible for people who may come across this issue in future that would be good.

ADD REPLY • link 5.9 years ago by Joe 21k

0

Entering edit mode

Hi, I'm facing the same issues. I'm getting the gene names for my ballgown object as "." when I run the code geneNames(bgControl) and

texpr(bgControl, 'all')$gene_name.

Can you please elaborate how fixed this issue? Did you add the gene names column in the annotation file?

ADD REPLY • link 4.2 years ago by lakshmi9c ▴ 10