Hi,
Background: We got Fusarium oxysporum genome sequenced.
- Genome Size: ~60mb
- Coverage: 100x
- Platform: Illumina NOvaseq x (Paired 150bp)
GOAL: Denovo Genome Assemblies
Problem 1:
The company gave multiple files per sample (ideally it shoiuld be just Forward and Reverse reads). BUt on discussion they said that their main concern was atleast 6Gb of data per sample. In 1st run they didnot achieve this. So the rerun the samples and got >6Gb data per sample in 2nd run. Now they gave us files from both 1st and 2nd run. So now for Sample ILL_02 I have 4 files as follow.
ILL_02_MKDN240005763-1A_227NJ5LT4_L4_1.fq.gz
ILL_02_MKDN240005763-1A_227NJ5LT4_L4_2.fq.gz
ILL_02_MKDN240005763-1A_227NJMLT4_L8_1.fq.gz
ILL_02_MKDN240005763-1A_227NJMLT4_L8_2.fq.gz
Solution I got: The solution I was presented to this problem was to just merge the files using zcat as
zcat ILL_02_MKDN240005763-1A_227NJ5LT4_L4_1.fq.gz ILL_02_MKDN240005763-1A_227NJMLT4_L8_1.fq.gz > ILL_02_merged_1.fq.gz
zcat ILL_02_MKDN240005763-1A_227NJ5LT4_L4_2.fq.gz ILL_02_MKDN240005763-1A_227NJMLT4_L8_2.fq.gz > ILL_02_merged_2.fq.gz
Question: Is this really a good approach ? Is their anyother method to merge these datasets of two runs ?
Was the same library re-run or was a new set of libraries made? Did they run a single pool of libraries on all lanes? If it is the same library re-run on two flowcells then these are technical sequencing replicates. There should be little, if any, batch effect unless a different chemistry was used for two runs.
You can use plain
cat
. No need to usezcat
here.the representative of sequencing company gave me this responce when i asked about multiple files per sample and i quote here
I think they jusr re-run the same library.
As long as the lower yield first time around was not because of a problem of some sort with the software/hardware it should be fine to merge the data.