I am working with RNA Seq data analysis. so I am interested in replicating your data (Transcriptional comparison of the filamentous fungus Neurospora crassa growing on three major monosaccharides D-glucose, D-xylose and L-arabinose) (https://biotechnologyforbiofuels.biomedcentral.com/articles/10.1186/1754-6834-7-31). I have taken glucose and arabinose as a first condition. I have followed what they did. But my read count varying compare to original read count.
Actual data My_output
locus arabinose_Rep1 locus arabinose_Rep1
NCU00003 1541 NCU00003 831
NCU00004 2392 NCU00004 1164
NCU00005 916 NCU00005 466
NCU00006 4916 NCU00006 2757
NCU00007 1783 NCU00007 887
NCU00008 1645 NCU00008 835
NCU00009 0 NCU00009 0
But i got the exact gene counts. The problem is read counts are varying like i got half of the reads what they got.I have used correctly. Please do share your comments.
Unless you are using identical versions of software/genome build/command line options chances of being able to replicate the results absolutely are small. In addition most NGS aligners produce non-deterministic output (it may slightly differ between runs) so expect some small variation to be present.
Thanks for your answer. yes some small variation could be applicable. But compare to my output they got two fold high for all reads. Is it possible? So what is the problem? and another question also. If any additional parameter can make this kind of variation.
Small differences in your command lines (and or any other parts of the pipeline) can result in differences. Pay special attention to how multi-mappers have been handled/counted. If you are 100% sure that your results are correct you could try contacting the authors to see what may be going on.
Thanks for your suggestion.