Entering edit mode
3.2 years ago
BlueSky
▴
10
I'm trying to merge several RNAseq datasets together. I have already preprocessed datasets (uniformly processed), and they contain different numbers of probe ids because they are already filtered, so when I use the merge() function to merge them I get the probe ids that they have in common, but then I loose information from some of the datasets with more probe ids in them. How do I deal with that?
In the
merge
function add the argumentall = TRUE
Thanks for answering, but what do I then do with the cells that will be filled with NA? Do I just put them to 0, or will that be wrong?
The answer to this depends on the questions you will be asking of the data set. Your merged data set will be misleading for any question where 0 might be a meaningful answer. Whereas NA might prevent you from being able to ask questions about certain genes. The caveats are yours to compose, as long as you can carry them along, communicate them, and not mislead yourself or others by them.
I want to do PCA and hierarchical clustering (preferably with top DEGs; here I suppose the 0 will give me problems in the Heatmap?). Would it be better to just use the common genes and not keep the ones that are not common then?
Hello there,
can you tell please a little bit more about the circumstances? E.g. show an example of your problem, clarify which software you use (R, Python, Java, etc)