ballgown has "." as Gene Names
1
0
Entering edit mode
6.0 years ago

Hi, I'm using stringtie and ballgown (in R) for the standard RNA-seq data analysis.

Both

texp(ballgown_obj1, 'all')$gene_name

and

ballgown::geneNames(ballgown_obj1)

return all the gene names as “.”

How can I get my gene names?

Thanks,

ballgown RNA-Seq assembly • 2.7k views
ADD COMMENT
1
Entering edit mode

Did you do a denovo assembly ? If so, you will have to first add the gene names by using getGenes() and a gtf file. Also check what is the output of indexes(ballgown_obj1)$t2g

ADD REPLY
0
Entering edit mode

Hi Aditi, It is not denovo assembly. he output of indexes(ballgown_obj1)$t2 is

t_id    g_id
4     4 MSTRG.5
7     7 MSTRG.5
9     9 MSTRG.5
10   10 MSTRG.5
16   16 MSTRG.2
17   17 MSTRG.2
ADD REPLY
1
Entering edit mode

Can you post your code here?

ADD REPLY
0
Entering edit mode

Aditi, I am sorry for the late reply. I was fixing my PC.. Below is the code for stringtie and ballgown. Here is the code for stringtie: stringtie -e -B -p 8 -G ./stringtie_merged.gtf -o ${BALLGOWNDIR}/SRR${A}/SRR${A}.gtf ${HISAT2DIR}/SRR${A}.bam

Here is the code for ballgown

#Read phenotype sample data
pheno_data = read.csv("data/frda_phenodata.csv", header = TRUE, colClasses = rep("character", 4))
pheno_data = pheno_data[order(pheno_data$ids), ]


# Read in expression data
ballgown_obj = ballgown(dataDir = "data/ballgown", samplePattern = "SRR", pData = pheno_data)

#Pre-Filtering out genes that are expressed at low levels prior to differential expression analysis reduces the severity of the multiple-testing correction and may improve the power of detection.\
#Pre-filtering to keep only rows that have at least 10 reads total (across samples). 
ballgown_obj1 = subset(ballgown_obj, "rowSums(texpr(ballgown_obj)) >= 10", genomesubset=TRUE)

#Filter low-abundance genes. Here we remove all transcripts with a variance across the samples of less than one:
ballgown_obj2 = subset(ballgown_obj, "rowVars(texpr(ballgown_obj)) > 1", genomesubset=TRUE)

#DE by transcript
differ_transcripts1 = stattest(ballgown_obj1, feature = "transcript", covariate = "genotype", getFC = TRUE,  meas = "FPKM")

#DE by gene
differ_genes1 = stattest(ballgown_obj1, feature = "gene", covariate = "genotype", getFC = TRUE, meas = "FPKM")
ADD REPLY
0
Entering edit mode

I solved the problem, the annotation file did not a gene_name column.

ADD REPLY
0
Entering edit mode

Hi there,

I have the same "." problem. I read your code and basically used the same, but I can't find the solution. How did you solve at the end?

Thanks

ADD REPLY
0
Entering edit mode
5.9 years ago

I solved the problem, the annotation file did not a gene_name column.

ADD COMMENT
0
Entering edit mode

Hi Fawzi,

It’s perfectly OK to answer you own questions, but if you could endeavour to make the answers as thorough as possible for people who may come across this issue in future that would be good.

ADD REPLY
0
Entering edit mode

Hi, I'm facing the same issues. I'm getting the gene names for my ballgown object as "." when I run the code geneNames(bgControl) and

texpr(bgControl, 'all')$gene_name.

Can you please elaborate how fixed this issue? Did you add the gene names column in the annotation file?

ADD REPLY

Login before adding your answer.

Traffic: 1706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6