I remember the first day I joined biostars. When I felt so desperate about how to conduct my master's thesis as it was my first research experience and as a biostatistician, I was not familiar with genetics and bioinformatics. I made an account on biostars and asked for help here. You guys helped me a great deal and without your help, I would find it so difficult to conduct my research as I did not know any one around to help me. I had to study everything online. You can see my activity and questions on biostars as well.
Now, I have happily defended my thesis and got the total grade. I want to share my happiness with you and thank you for all your support.
I have also written a paper based on my thesis. I asked for permission here: C: Posting a preprint paper on biostars.com
Would you please help me with it? I would be grateful if you give me your kind comments as this is my first experience in writing a paper. What do you think can help improve this paper? Do you know what more analysis I can conduct? You can comment on biorxiv too.
I draw some heatmaps and spaghetti plots for my thesis but I have not used them in my paper as they did not to seem to be appropriate for this paper.
That looks like great work. Congratulations again. I'm afraid the paper is a bit too statistics-heavy for me to comment in depth, but I have some thoughts I can share here. Those are potential improvements and not mistakes or things I feel very strong about, so I'll post them here, I think. I will post a link to this thread on bioRxiv though.
You are using the common set between DESeq2, edgeR and (Limma?) Voom. By taking the intersection of the datasets you aim for a high confidence set, which is certainly not a wrong approach, but comes at the cost of sensitivity. You could consider doing a similar comparison in which your Benchmark DEG are either the union of those three methods or genes shared by at least two, and see if that changes anything.
Your figure 1b is a non-proportional Venn diagram with 4 sets. That's visually hard to interpret. I agree it looks pretty though, but if a section with 6971 genes is as large as one with only 4 then it's hard to visually draw conclusions from that figure. Instead I would suggest playing around with UpSetR plots for multi-dataset comparisons and overlaps. The same can be said about figure 2: while this are less sets to compare your venn diagram is non-proportional and therefore hard to visually interpret. It requires that I read all numbers and compare those in my head to judge about overlaps.
I see you use a real data set rather than a simulated one, which is i) better because it's real ii) not so convenient because you don't have a real truth set, except for the one created by using the popular algorithms listed above. I don't know about useful RNA-seq simulation tools though. Something else that comes to mind is that the model you use is lentiviral transduction of cells, which is a very aggressive treatment and hence gives thousands of differentially expressed genes. You perhaps could think about also looking at another dataset in which less genes are differentially expressed, as overlaps then are more significant and less likely by chance alone and sensitivity becomes more important.
Thank you a lot for your great comments. That is so kind of you. 1- Yes, That is a good idea. I can use the common DEGs in at least two tools. I can also conduct more analysis using this idea. 2- Oh, yeah. I hadn't considered that. I agree with you. My figures look more like a table rather than a figure as figures should help the reader understand the data visually. In multiDE's paper, they had the same figures as mine so that's why I draw them that way. 3- So I guess I need to find a less aggressive data. I will do some research.
I highly appreciate your kindness and useful comments. Thanks for your time and consideration.
Looks like an interesting paper, thanks for sharing! I'll provide feedback when/if I have anything constructive to say.
Thank you a great deal. I am happy to hear it. Your comments will be very helpful.