group reads by sequence length

0

Entering edit mode

6.3 years ago

genya35 ▴ 50

Hello,

I have two Illumina fastq files (read 1 and read 2) that contain reads from three different frameworks. Each framework has a different sequence length distribution from the other two. Are there any existing tools that can separate each framework into a separate file based on the sequence length distribution? Do you have any other suggestions how to separate reads from three different frameworks?

Thanks

next-gen • 1.0k views

ADD COMMENT • link 6.3 years ago by genya35 ▴ 50

0

Entering edit mode

What does frameworks exactly mean? Different sequencing platforms or the same platform but different sequencers/run types (e.g. Illumina 2 x 100bp, 2 x 50 bp etc.)?

You could look at using reformat.sh from BBMap suite with minlength= and maxlength= options if all the reads are the same length (but each with a distinct value) for three platforms.

ADD REPLY • link 6.3 years ago by GenoMax 151k

0

Entering edit mode

the same platforms but different primers for each framework, (Miseq Illumina, average length of read is 168 after joining the two reads.) I will not know the exact range of size for each framework but the sequence size from the three frameworks will not overlap. I was hoping to find a tool that would group based on sequence length using statistical methods. Thanks

ADD REPLY • link 6.3 years ago by genya35 ▴ 50

0

Entering edit mode

Still not clear about what you mean by framework. If the sizes are distinct then you may be able to use the program above or may need to roll your own custom solution that just looks at the length of reads and bins them.

ADD REPLY • link 6.3 years ago by GenoMax 151k

Login before adding your answer.