Hi all,
I have a question regarding batch effect and RNA counts.
We completed two 5' 10X genomics single-cell run at our genomics facility. In the first batch (n=4) none of the samples had warnings in their libraries, but in the second batch 3/4 samples had warnings such as 'low fraction valid barcodes' and 'high fraction of reads mapped antisense to genome'. In addition, the average feature RNA was around 1500 in the first batch compared to 300 in the second and the count RNA was around 3000 in the first batch and 1000 in the second.
I think there was a problem with the sequencing in the second batch, but the genomics facility have told me they have checked the sequencing parameters and think that the run was okay. I have also spoken to 10X genomics who initially thought sequencing could have been an issue.
It has been difficult to integrate these batches and it has had an impact on my differentially expressed genes as there were more patient samples in the second run.
Has anyone else encountered a batch effect of this size? We still have access to the libraries so in theory we could resequence, but I am not sure whether this would help.
Thank you!
That's a problem of poor library quality, not of sequencing. Resequencing will likely not change anything, you might need to redo the entire library.
Hi thank you, can I ask why you think it is a library issue?
Because you get low cell numbers. What else other than a poor library woule be the reason? If the sequencing failed you would see that in the fastqc per-base quality but this is unlikely given how mature Illumina machines are plus the core would not release data like that.
The cell numbers are similar between the batches, but the count and feature RNA metrics are significantly lower in the second batch.
These are the warnings I get for one of the samples in the second batch:
Low Fraction Valid Barcodes (61.42%) Ideal > 75%. This may indicate a quality issue with the R1 read for Single Cell 3' v2/v3 and Single Cell 5', or either R1 or R2 for Fixed RNA Profiling. Application performance may be affected. High Fraction of Reads Mapped Antisense to Genes (28.10%) Ideal < 20%. Rates of up to 40% are common for single nuclei samples. Higher fraction of antisense reads may indicate use of an incorrect chemistry type, or an issue with the reference transcriptome.