Processing Metagenomics Reads from Multiple Lanes

2

Entering edit mode

8.5 years ago

adityabandla ▴ 30

Hi,

I have been using a generic pipeline for processing a recent metagenomics dataset that I received.

Samples were sequenced on two lanes on the HiSeq. Thus, I have 4 sets of reads per sample i.e. Reads 1 and 2 from Lane 1 and Reads 1 and 2 from lane 2

My workflow has been the following 1. Adapter and Quality Trimming all four files per sample 2. Concatenate all four files into a single file 3. Run diamond on this single file 4. Then use Megan6 for further processing

Can anyone please advise if Step 2 is appropriate in my case or is it better to process each lane separately?

Metagenomics HiSeq DIAMOND • 2.2k views

ADD COMMENT • link 8.5 years ago by adityabandla ▴ 30

1

Entering edit mode

It is appropriate to concatenate respective R1 and R2 files from the two lanes if the same sample ran in both lanes. It would not be appropriate to cat all 4 files together into one (unless diamond (I am not familiar with it) requires it).

ADD REPLY • link 8.5 years ago by GenoMax 147k

1

Entering edit mode

As the alignment is anyway done for each sequence separately, how does it matter if the fastq/fasta files are combined or alternatively broken into smaller pieces?

ADD REPLY • link 8.5 years ago by adityabandla ▴ 30

1

Entering edit mode

If you are treating them as single end data then it is fine. Was there a reason to do PE sequencing then?

Even if you just used the R1 data you are going to get the same exact answer (if you used R1 and R2 reads as separate queries) since the same fragment is sampled by the two reads. (Diamond is fast "blastx" alternative).

ADD REPLY • link 8.5 years ago by GenoMax 147k

Login before adding your answer.