How to equalize two vector lengths?
1
2
Entering edit mode
10.1 years ago
Parham ★ 1.6k

Hello,

I am doing goseq and I have two vectors for a code line. de.genes vector is about 250 and lengthData is about 7000. I have to make lenghtData to match up with de.genes I guess, as far as I understood from the error below. But since I am not expert on codes and stuff I cannot figure out how to do it. Can someone help me with that? Very appreciated!

> gene_pwf = nullp(de.genes, bias.data=lengthData)
Error in nullp(de.genes, bias.data = lengthData) : 
  bias.data vector must have the same length as DEgenes vector! 
GO vector length • 5.1k views
ADD COMMENT
4
Entering edit mode
10.1 years ago

Presuming that de.genes is a subset of the original length 7000 genes vector, then just subset lengthData in the exact same way as you did genes. We can't know exactly how you did that, you didn't show us.

ADD COMMENT
0
Entering edit mode

You are right Devon, I had to be more specific. So I prepared de.genes the same way as you showed me here from deseq2resoutput. Then for lengthDataI did as follow as goseq workflow suggests:

txdb <- makeTranscriptDbFromBiomart(biomart="fungal_mart", dataset="spombe_eg_gene", host="fungi.ensembl.org")
txsByGene=transcriptsBy(txdb, "gene")
lengthData=median(width(txsByGene))

May be I am wrong interpreting the source of the problem. If you need any other information please write.

ADD REPLY
1
Entering edit mode

Sorry, I missed the reply. Something along the lines of de.lengths <- lengthData[which(d$padj<0.05)] should solve the problem. Just use de.lengths then.

ADD REPLY
0
Entering edit mode

No worries! This worked but the data types of these two objects are not the same and I get an error for that "Error in sum(y[ww][1:size]) : invalid 'type' (character) of argument". Can one be converted to another?

ADD REPLY
1
Entering edit mode

According to help(nullp), de.genes should be "A named binary vector where 1 represents DE, 0 not DE and the names are gene IDs." I recall that being different in a previous version of goseq, though perhaps I'm misremembering. So, something like:

d <- read.csv("deseq2res.csv", header=T, row.names=1)
deGenes <- c(rep(0, nrow(d))
deGenes[which(d$padj<0.05)] <- 1
row.names(deGenes) <- row.names(d)

Then just use lengthData and deGenes as is (as long as they have the same order).

ADD REPLY
0
Entering edit mode

Thanks Devon, this looks like it will work unless there is minor thing in the last line. It gives an error that I don't know what to do with. Also I have a question. What's the difference between second line your wrote comparing to degenes <- rep(0, nrow(d)))? I created both and it seems they both contain the same data! Thanks again for your help.

> row.names(deGenes) <- row.names(deseqres)
Error in `rownames<-`(x, value) : 
  attempt to set 'rownames' on an object with no dimensions
ADD REPLY
1
Entering edit mode

Try instead names(deGenes) <- row.names(deseqres)

ADD REPLY
0
Entering edit mode

The problem I had in the beginning is back! de.lengths <- lengthData[which(deseqres$padj<0.05)]length is 237 and the deGenes is 6089! I guess we should tell de.lengths to filter out from deGenes. Is it correct?

ADD REPLY
1
Entering edit mode

Please reread my comment from 13 hours ago. Apparently in the most recent versions of goseq one doesn't subset things.

ADD REPLY
0
Entering edit mode

Right, now I understand! Sorry asking somethings twice. I am learning and it is not easy for me to think of all aspects at once.

However the lengthDatathat I create from txdbholds whole full genes list with length of 7019, but the deGeneswhich is created from deseqresholds 6089 since deseqremoves the rows that have a sum of zero during calculations! So I have to remove those rows that are not present in lengthDatato make them same length. Is it right what I think?

ADD REPLY
1
Entering edit mode

Correct, you'll to use %in% to see which of the rows of txsByGene are in deseqres.

ADD REPLY
0
Entering edit mode

Can I just remove the rows in lengthDatathat are not present in deGenes? If you could show how?

ADD REPLY
1
Entering edit mode

Whether you subset lengthData or txyByGene is up to you. You'll need to use %in% either way. You should be able to figure out how to do this yourself.

ADD REPLY
0
Entering edit mode

Ok, it took a long time until I could come up with something that might do the job. However I would like to check with you if it is correct, if you could have a glance. So first I make a vector with all the genes present in both lengthDataand deseqresthen I subset lengthDatainto a new_lengthData with them. I even can't express myself very well. But here is how I did:

> select_genes <- as.vector(names(lengthData)%in%row.names(deseqres))
> new_lengthData <- lengthData[select_genes]
ADD REPLY
1
Entering edit mode

Looks correct.

ADD REPLY
0
Entering edit mode

Devon did you see my reply here? I appreciate if you can give some help in here.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6