Hi,
I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index
stage because the output files are in a non-human readable format and I believe I am making a misstep somewhere. Below I have my code and I have the head
of the outputted .bam.bai file.
For simplicity we will assume all files are in the same folder.
#concatenate lanes
cat L001_R1_001.fastq.gz L002_R1_001.fastq.gz L003_R1_001.fastq.gz L004_R1_001.fastq.gz > subject_R1.fastq.gz
cat L001_R2_001.fastq.gz L002_R2_001.fastq.gz L003_R2_001.fastq.gz L004_R2_001.fastq.gz > subject_R2.fastq.gz
#index genome
bwa index -t 8 GCA_000001405.15_GRCh38_no_alt_analysis_set.fna GCA_000001405.15_GRCh38_no_alt
#create bam file. Sam file creation is piped to save space.
bwa mem -t 8 GCA_000001405.15_GRCh38_no_alt_analysis_set.fna subject_R1.fastq.gz subject_R2.fastq.gz -M \
-R "@RG\tID:FlowCell.subject1\tSM:subject1\tPL:illumina\tLB:mito.subject1" | \
samtools sort -O bam -o ${bamfolder}/subject1_bwa_output.bam
#create bam.bai file
samtools index -b subject1_bwa_output.bam
#check the bam.bai file
samtools flagstat subject1_bwa_output.bam.bai > subject1_stat_bwa_output.txt
samtools idxstats subject1_bwa_output.bam.bai > subject1_idxstat_bwa_output.txt
During the checking phase I am getting the following errors.
[E::hts_hopen] Failed to open file subject1_bwa_output.bam.bai
[E::hts_open_format] Failed to open file "subject1_bwa_output.bam.bai" : Exec format error
samtools flagstat: Cannot open input file "subject1_bwa_output.bam.bai": Exec format error
[E::hts_hopen] Failed to open file subject1_bwa_output.bam.bai
[E::hts_open_format] Failed to open file "subject1_bwa_output.bam.bai" : Exec format error
samtools idxstats: failed to open "subject1_bwa_output.bam.bai": Exec format error
Here is a screen-shot of the output from the bam.bai file. I feel like this is not correct.
Thanks, that worked.
fyi, the bai file is an index, you do not ever need to interact with it directly. If a tool needs it it will look for it in the same folder as the main bam file automatically. It does not contain human-readable content that could be of interest.
This makes sense. I had assumed the bam.bai file is a summarized copy of the .bam file. Thank you!