Hello all!
I am trying to resequence a wild peanut species genome and align it to cultivated peanut.
The steps I followed are thus:
- Generate a BWA index for the reference gnenome: "bwa index -a bwtsw tifrunnerA.fa
- Generate a fasta file index: samtools "faidx tifrunnerA.ga"
- Map the paired-end reads: "bwa mem tifrunnerA.fa correntinaR1.fq correntinaR2.fq > correntina_BWA.sam"
- Convert sam to bam: "samtools view -S -b correntina_BWA.sam > correntina_BWA.bam"
- Sort: "samtools sort correntina_BWA.bam -o corretnina_sorted.bam:
- Index: "samtools index correntina_sorted.bam"
The resulting correntina_sorted.bam file was 35,057,080 KB and the correntina_sorted.bam.bai file was 3,249 KB.
The issue is that when I try to load the .bam file into IGV to view the alignment, IGV ignores the .bai file in the same folder as the .bam file and tries to create a .fai file. Why is it trying to create a .fai file?? The IGV error reads: "Could not create index file: Z:\Chandler Levinson\2018 Experiments\Correntina sequence\correntina_sorted.bam.fai." I have not been able to find a thread that addresses this issue.
Thank you to anyone who tries to help me. I look forward to your response!
fasta file must be indexd with
samtools faidx ref.fa
Thanks for responding! I did that to prep the reference genome, but do I need to do that to index the bam alignment too instead of using "samtools index"?
Have you generated a custom genome with the reference file and its index?
Yes! The custom genome is is correntina_sorted.bam. The reference file is tifrunnerA.fa. When I indexed the reference I got a .fa.amb, .fa.ann, .fa.bwt, .fa.fai, .fa.pac, .fa.sa files.
You can combine steps 3,4 and 5 and avoid intermediate files using:
TIL....I always thought you needed the sam-to-bam conversion samtools view -Sb, but this looks like it works
It does work since recent samtools versions, looking at the extension as specified by the -o parameter IIRC.
Thank you for this tip! I will do this next time! :)
How exactly are you trying to load your bam?
So in IGV I go to "Genomes" and then "Load genome from file..." Then I click on the .bam file. Even though there is a .bai file in the same folder, it tries to generate a .fai file, which has 0KB and does nothing.