Finding Common Reads Across Multiple Fastq Files
2
0
Entering edit mode
12.9 years ago
Abhi ★ 1.6k

Hi All

We have some metagenome samples(multiple illumina lanes). What I would like to do is find out % of reads that are common amongst these fastq's allowing upto #N mismatches.

I think I can take a subsample of the reads from each fastq/bin and compare them but just wondering if there is a slick approach to do the comparison.

Thanks! -Abhi

fastq • 4.4k views
ADD COMMENT
0
Entering edit mode

Do you allow difference of quality?

ADD REPLY
0
Entering edit mode

@Manu : For now I dint think about it. I was just wondering if we can comapre the reads at base level and allowing 2-4 mismatches between the reads should cover for difference in quality scores.

ADD REPLY
3
Entering edit mode
12.9 years ago

I'd start by looking at the tools contained in vmatch. There are probably many ways to approach this problem but it seems sensible to use some kind of indexing on the fastq files prior to doing the comparisons.

ADD COMMENT
0
Entering edit mode

neat software, did not about it before

ADD REPLY
1
Entering edit mode
12.9 years ago

cd-hit would do what you want. I just learnt by the way that you can directly use fastq file as input. You can also take a look at uclust (usearch).

ADD COMMENT

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6