Hello, I was wondering if anyone could help, I am interested in reproducing the work that has been published in the paper SPRITE-publication for the purpose of benchmarking. I have accessed the raw files that had been published here: Raw-data.
For the purpose of analysis, I have merged the 11 technical - replicates into a single paired sample (forward and reverse) and executed the pipeline as published in Github SPRITE-Pipeline. The configuration file Config.txt used in this process pertains to the barcodes utilized for SPRITE tagging in the GM12878 DNA-SPRITE experiment.
According to the publication, "a successful SPRITE experiment typically achieves >75% of total reads tagged with all five identified barcodes, corresponding to a ligation efficiency of approximately 95% per round (0.955 rounds = 0.75)". However,my results deviate from the reported findings despite using identical data, barcodes, and scripts. Below are the observed benchmark percentages:
5995219 (0.4%) reads found with 0 barcodes.
37246361 (2.3%) reads found with 1 barcode.
101531585 (6.4%) reads found with 2 barcodes.
304877984 (19.2%) reads found with 3 barcodes.
161625469 (10.2%) reads found with 4 barcodes.
**974956107 (61.5%) reads found with 5 barcodes.**
Additionally, the distribution of barcodes across different positions is as follows:
1526493880 (96.2%) barcodes found in position 1.
1511746202 (95.3%) barcodes found in position 2.
1459052384 (92.0%) barcodes found in position 3.
**1141582366 (72.0%) barcodes found in position 4.
1037351062 (65.4%) barcodes found in position 5.**
I can't find a reason why the benchmark isn't giving me the same result. Any suggestions?
Thank you very much for your response. However, in my case, I used the publication’s data, pipeline, and barcodes, but I didn’t achieve the same results. What could be causing these differences in ligation efficiency? Is there anything I can do to optimize my results?.. Thank you again! I will do as you suggested and run other SPRITE data from GEO. Could you please provide me with other data for human DNA-SPRITE, other than the one that uploaded in GEO and the in 4DNF
Please use "Add comment" rather than "Add your answer" unless it's an answer :)
There is some DNA SPRITE data available at GSE247833 and at GSE151515.
If you followed the steps (i.e. the first three steps: adapter trimming, barcode identification, and efficiency calculation), your results are fine. Honestly, don't worry about it; maybe a different set of reads were used for that figure. If you want to recover more reads, just edit the config.txt file to be more error tolerant.
Thank you.