Question

Looking for help and comments for my preprint paper

2

Entering edit mode

6.2 years ago

statfa ▴ 790

Hi Biostars.

I remember the first day I joined biostars. When I felt so desperate about how to conduct my master's thesis as it was my first research experience and as a biostatistician, I was not familiar with genetics and bioinformatics. I made an account on biostars and asked for help here. You guys helped me a great deal and without your help, I would find it so difficult to conduct my research as I did not know any one around to help me. I had to study everything online. You can see my activity and questions on biostars as well.

Now, I have happily defended my thesis and got the total grade. I want to share my happiness with you and thank you for all your support.

I have also written a paper based on my thesis. I asked for permission here: C: Posting a preprint paper on biostars.com Would you please help me with it? I would be grateful if you give me your kind comments as this is my first experience in writing a paper. What do you think can help improve this paper? Do you know what more analysis I can conduct? You can comment on biorxiv too.

I draw some heatmaps and spaghetti plots for my thesis but I have not used them in my paper as they did not to seem to be appropriate for this paper.

Here is the link to my paper: https://www.biorxiv.org/content/early/2018/10/22/448886

Would you also tell me in what journal I can publish it?

Thanks a great deal.

preprint paper Differential Expression RNA-seq • 1.5k views

ADD COMMENT • link updated 6.2 years ago by WouterDeCoster 47k • written 6.2 years ago by statfa ▴ 790

1

Entering edit mode

Looks like an interesting paper, thanks for sharing! I'll provide feedback when/if I have anything constructive to say.

ADD REPLY • link 6.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you a great deal. I am happy to hear it. Your comments will be very helpful.

ADD REPLY • link 6.2 years ago by statfa ▴ 790

score 1 · Answer 1 · 2018-10-27

That looks like great work. Congratulations again. I'm afraid the paper is a bit too statistics-heavy for me to comment in depth, but I have some thoughts I can share here. Those are potential improvements and not mistakes or things I feel very strong about, so I'll post them here, I think. I will post a link to this thread on bioRxiv though.

You are using the common set between DESeq2, edgeR and (Limma?) Voom. By taking the intersection of the datasets you aim for a high confidence set, which is certainly not a wrong approach, but comes at the cost of sensitivity. You could consider doing a similar comparison in which your Benchmark DEG are either the union of those three methods or genes shared by at least two, and see if that changes anything.
Your figure 1b is a non-proportional Venn diagram with 4 sets. That's visually hard to interpret. I agree it looks pretty though, but if a section with 6971 genes is as large as one with only 4 then it's hard to visually draw conclusions from that figure. Instead I would suggest playing around with UpSetR plots for multi-dataset comparisons and overlaps. The same can be said about figure 2: while this are less sets to compare your venn diagram is non-proportional and therefore hard to visually interpret. It requires that I read all numbers and compare those in my head to judge about overlaps.
I see you use a real data set rather than a simulated one, which is i) better because it's real ii) not so convenient because you don't have a real truth set, except for the one created by using the popular algorithms listed above. I don't know about useful RNA-seq simulation tools though. Something else that comes to mind is that the model you use is lentiviral transduction of cells, which is a very aggressive treatment and hence gives thousands of differentially expressed genes. You perhaps could think about also looking at another dataset in which less genes are differentially expressed, as overlaps then are more significant and less likely by chance alone and sensitivity becomes more important.