Entering edit mode
8.5 years ago
adityabandla
▴
30
Hi,
I have been using a generic pipeline for processing a recent metagenomics dataset that I received.
Samples were sequenced on two lanes on the HiSeq. Thus, I have 4 sets of reads per sample i.e. Reads 1 and 2 from Lane 1 and Reads 1 and 2 from lane 2
My workflow has been the following 1. Adapter and Quality Trimming all four files per sample 2. Concatenate all four files into a single file 3. Run diamond on this single file 4. Then use Megan6 for further processing
Can anyone please advise if Step 2 is appropriate in my case or is it better to process each lane separately?
It is appropriate to concatenate respective R1 and R2 files from the two lanes if the same sample ran in both lanes. It would not be appropriate to cat all 4 files together into one (unless diamond (I am not familiar with it) requires it).
As the alignment is anyway done for each sequence separately, how does it matter if the fastq/fasta files are combined or alternatively broken into smaller pieces?
If you are treating them as single end data then it is fine. Was there a reason to do PE sequencing then?
Even if you just used the R1 data you are going to get the same exact answer (if you used R1 and R2 reads as separate queries) since the same fragment is sampled by the two reads. (Diamond is fast "blastx" alternative).