Question

ABySS syntax for reads split across two lanes

0

Entering edit mode

9.4 years ago

kcamnairb ▴ 40

Hi, I have a paired end library that was split among two sequencing lanes. The left and right reads are in separate files so I have a total of 4 files for each library. How would I specify this library in abyss, or do I need to concatenate the reads from the two lanes? For example, would this be proper:

abyss-pe k=64 name=ecoli lib='pe200 pe500'
    pe200='pe200_lane1_1.fa pe200_lane1_2.fa pe200_lane2_1.fa pe200_lane2_2.fa' pe500='pe500_lane1.fa pe500_lane1_2.fa pe500_lane2.fa pe500_lane2_2.fa'

abyss assembly • 2.0k views

ADD COMMENT • link updated 9.4 years ago by benv ▴ 730 • written 9.4 years ago by kcamnairb ▴ 40

0

Entering edit mode

I definitely think the easiest thing to do is to concatenate your reads - putting all your Left reads in one file, and all your Right reads in the other, for each library that you have.

However I have to say, I don't think I've ever heard of 1 library being split and sequenced between two lanes. I'd be careful with that data. If there are any disparities between the performance of the two lanes, your data might be a bit wacky. On the other hand, maybe it's not such a big deal. Just curious, is there any particular reason you sequenced the library in this fashion?

ADD REPLY • link 9.4 years ago by dbrowne.up ▴ 80

0

Entering edit mode

Thanks, I'll try concatenating the reads. I actually have two libraries, so I think pooling libraries across multiple lanes is supposed to reduce any lane bias.

ADD REPLY • link 9.4 years ago by kcamnairb ▴ 40

0

Entering edit mode

When we have N samples to run across M lanes, it's normal to pool them across all to normalize any lane performance disparities across all samples. I haven't seen a lane completely fail , but the sequencing people claim it used to happen and this strategy is to make sure no samples are completely lost by a failed lane.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by karl.stamm 4.1k

score 1 · Accepted Answer · 2015-12-10

Hi @kcambairb,

The command line you have specified is correct.

If multiple files are specified, ABySS will assume that the files are ordered in the following way: dataset1_read1.fq, dataset1_read2.fq, dataset2_read1.fq, dataset2_read2.fq, ...

If you only specify a single file, ABySS will assume it contains both reads 1 and 2 (interleaved).

Btw, ABySS understands gzipped files and in addition to FASTQ you can also use FASTA, SAM, and BAM formats.