The data I have is sequencing of fragmented whole genomic DNA from several individuals of S. purpuratus (purple sea urchin), all as .FASTQ file format. I have an index of the genome to align and produce .bam output files. I have tried to identify indels, but only got a few small SNPs. I am looking for much larger insertions or deletions. I think that tophat might be detecting differences between the fastq file and the genome index only. I want to be able to detect indels across multiple fastq files. The differences (large size insertions and deletions) I want to detect are between the FASTQ files, not in the FASTQ versus the reference genome. Does anyone know of a good workflow to identify indels by simply aligning multiple fastq files?
Tophat is likely not the right aligner here, as it was intended for spliced alignment of RNA-seq data, and also for that purpose it is deprecated and should be replaced by e.g. STAR.
For genomic DNA you probably should use bwa mem
Yes, as per Wouter, you should re-align your data using, e.g.,
bwa mem
(if reads >70bp), and then you could use my answer with your aligned BAMs. TopHat is for RNA-seq reads and is a splice-aware aligner.