group reads by sequence length
0
0
Entering edit mode
5.7 years ago
genya35 ▴ 50

Hello,

I have two Illumina fastq files (read 1 and read 2) that contain reads from three different frameworks. Each framework has a different sequence length distribution from the other two. Are there any existing tools that can separate each framework into a separate file based on the sequence length distribution? Do you have any other suggestions how to separate reads from three different frameworks?

Thanks

next-gen • 940 views
ADD COMMENT
0
Entering edit mode

What does frameworks exactly mean? Different sequencing platforms or the same platform but different sequencers/run types (e.g. Illumina 2 x 100bp, 2 x 50 bp etc.)?

You could look at using reformat.sh from BBMap suite with minlength= and maxlength= options if all the reads are the same length (but each with a distinct value) for three platforms.

ADD REPLY
0
Entering edit mode

the same platforms but different primers for each framework, (Miseq Illumina, average length of read is 168 after joining the two reads.) I will not know the exact range of size for each framework but the sequence size from the three frameworks will not overlap. I was hoping to find a tool that would group based on sequence length using statistical methods. Thanks

ADD REPLY
0
Entering edit mode

Still not clear about what you mean by framework. If the sizes are distinct then you may be able to use the program above or may need to roll your own custom solution that just looks at the length of reads and bins them.

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6