Merging two sequencing datasets
2
0
Entering edit mode
6 weeks ago

Hi all,

I have sequenced some samples years ago on a Novaseq 6000. Now, I want to explore some other options for which I need higher depth, so we repeated the library prep and resequenced the samples on a Novaseq X. Now, I'd like to combine both datasets, but I'm unsure at what stage to do this.

Should I merge the fastq files and then put them through the pipeline together, or should I map and remove duplicates first, and then merge the clean bam files? Any insights in what would be better would be much appreciated!

Thanks!

NovaSeq genomics • 457 views
ADD COMMENT
1
Entering edit mode
6 weeks ago
rfran010 ★ 1.3k

I say map and QC them separately to determine any batch differences. Then decide to merge them or not (you can merge the bams). Details may depend on the type of library and analysis goals.

ADD COMMENT
0
Entering edit mode

We're planning to look at load and do some (recent) demographic analyses. If we QC them separately, are there specific things we should look for? E.g. in mapping quality, distribution of reads?

And if it looks like there are no pronounced batch effects, I assume we would not gain much by redoing the mapping of merged fastq files (because of improved mapping since we'd have more reads, even though the indiviudals datasets should already be 7-15x depth)? We could just as well merge the bams at that point?

Thanks again!

ADD REPLY
0
Entering edit mode

Look at the results from both dates on PCA. Ideally, results from each date should virtually overlap each other. or at least, there should still be more differences between treatments than between dates.

ADD REPLY
0
Entering edit mode
6 weeks ago

I would be wary of combining data from two separate library preps.

ADD COMMENT

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6