Entering edit mode
7.3 years ago
me.sr1510
•
0
can we align two fastq file over one another without using reference? what are the other options apart from the comparing the two vcf files generated from the samtools. these are result of illumina whole genome sequencing.
kindly advice.
What is your end goal?
to align the fastq files and get the uncommon genes between them
The goal is fine but the way you want to go about it seems very odd. You could remove duplicate reads comparing the two files and then look at what is left, though that approach is not without its own problems.
I would like to know is there any way to compare the two files with each other
Comparing raw data is not going to be of much help. Your best bet is to follow @Brian's suggestion below.
Define "uncommon genes". Are these genes that have differences in coverage, difference in sequence, exist in one but not the other according to de novo assembly, something else completely...?
Difference in sequence
Not to mention for the average fastq file which contains thousands, if not millions of separate reads you'd have an impossibly large dataset to analyse even if you could align them all against one another.
It seems like it makes more sense for your goal to assemble both datasets, call and annotate the genes, and then compare the annotated genes.
Please update your question with more specifics about what kind of organisms and data you have, and your specific goal.
Since you do not have reference (or do not want to use reference), it would be denovo assembly. You would not get genes as there is no reference and thus you would get scaffolds and/or contigs. Then you may get differential scaffolds and/or contigs. Aligning one raw file against another raw file (esp reads) do not make sense to me (i.e to my limited knowledge)