I have 2 sets of paired end sequencing files:
set1_r1.fastq and set1_r2.fastq for set 1
set2_r1.fastq and set2_r2.fastq for set 2.
The files in set 2 are considerably larger than the files in set 1.
I want to align the reads in set1 to set2 or at least determine the number of how many reads in set 1 align to set 2.
How can I best do this?
I understand that bowtie allows for aligning to an indexed reference genome. But here I want to align paired end fastq files to another set of paired end fastq files. Would it be possible to align both sets individually to the same reference genome and then somehow compare the output sam/bam files?
Is there any tool that can help me accomplish this?
You could try treating the set2 sequences as a "genome" and build and index from them to then align set1 to.
To do this you would convert the set2 from fastq to fasta and then build an index with
.Then once you have that, you would align set1 to the set2 "genome"
One thing to keep in mind is that when going from fastq --> fasta you will lose the quality data, so you might want to filter out low-quality reads before doing this. Otherwise you will be aligning to low-confidence reads which could yield questionable results.
Why do you want to align one set of fastq files with another set of fastq files?
I don't know why you'd want to do that, but if comparing both sets is about knowing what they share and what they don't, I'd suggest to map both sets individually and compare bam coverages afterwards.