Entering edit mode
2.2 years ago
seal
•
0
Hi, I've been looking over some of the available data at my lab and noticed that even in the same projects, different library preparation kits were being used. I read from a paper that different library prep kit will yield different quantity and quality of reads even when subjected to the same sequencing platform (the paper use HiSeq).
I was wondering if there are methods to combine sequencing data from two library prep kit (for example, truseq nano and nextera flex) with minimum bias? So far I couldn't find any.
For what kind of data? If its just genome sequencing you can probably combine reads fairly easily (assuming the kits have the same 'parameters': e.g. paired-end, insert size etc). Depends what your chosen assembler can deal with.
If its for something quantitative like RNAseq your batch effects, even between the same kit preps, will likely be so big as to make it pointless.
I'm currently working on metagenomics. Specifically, human metagenomic data that has been sampled on two separate occasion, once using truseq nano and one using Illumina DNA Prep (aka Nextera Flex). Both data were sequenced using novaseq (I have one other sequenced using nextseq). But, how can I combine them with minimum bias? I'm really new in my studies so I am clueless
I'm not sure you really can. I would perhaps treat each set of libraries as different, analyse their diversity (I assume this is gut microbiome metagenomics or something?). See if you get similar results for both independently, then perhaps consider combining them.
I'm not clear what you think you'll gain by combining them though?
It does help me create a taxonomic profile for certain groups of people. Yes, it is gut microbiome metagenomics. I know there are journals where the researchers take data from public database to conduct large scale research, but I don't know how they normalize their samples. I'm sure not every group of samples they use came from the same library prep kit if they used commercial kit at all