Question

bam is in a format that cannot be usefully indexed

0

Entering edit mode

4 months ago

Kelly ▴ 10

Hi everyone, I am new to coding and am attempting to create .BAM files from my fastq trims, but it is giving me the error "bam is in a format that cannot be usefully indexed", as well as saying that "samtools sort: failed to read header from "-", but I do not understand what this is referencing to. The code I am using is a script I copied from my research partner when we did this same coding practice on another sample of reads, and the code worked perfectly fine for those reads. The code is:

module load star
module load samtools
cd $HOME/baboons/reads






mkdir -p mapped_STAR
for i in `ls trimmed_fastq2/*.trimmed.fastq | cut -f2 -d'/'`
do
     base=`echo ${i} | cut -f1 -d'.'`
    STAR --genomeDir genome_map/baboon_index --runThreadN 16 --readFilesIn trimmed_fastq2/*.trimmed.fastq --outFileNamePrefix mapped_STAR/${base} --outSAMtype BAM SortedByCoordinate --outSAMunmapped None --outFilterMismatchNmax 3 --outFilterMultimapNmax 1 --outSAMattributes All | samtools sort -@ 8 -o mapped_STAR/${base}Aligned.sortedByCoord.out.bam > ${i}_log
        samtools index mapped_STAR/${base}Aligned.sortedByCoord.out.bam
done

If anyone could give me advice on this, I would really appreciate it. A clue that I am thinking about is that when I ran the code to index the genome, I only received one SA file, however on our other practice, I got an SA file for each NCBI read. Could that be why my mapping isn't working? Thank you!!!!

mapping bam index star • 701 views

ADD COMMENT • link 4 months ago by Kelly ▴ 10

0

Entering edit mode

I don't think you are making a bam with that STAR command. I don't think STAR tolerates * in file names, so spell them out.

ADD REPLY • link 4 months ago by swbarnes2 14k

score 0 · Answer 1 · 2024-07-18

STAR will tolerate the '*' in the filenames command as this gets expanded by the command line. However, this may cause issues depending on what list it expands to. This argument should only take one sample which could be two files if paired. --readFilesIn /path/to/read1 [/path/to/read2]

Also, to my knowledge, STAR does not output to stdout like other aligners. Instead it writes directly to mapped_STAR/${base}Aligned.sortedByCoord.out.bam So you are piping irrelevant info into samtools sort which is overriding the output file.

Also, with the --outSAMtype BAM SortedByCoordinate option, you don't need samtools sort.

Here's a possible correction. I formatted according to my preference as well, but the important bit is to correct the input to outFileNamePrefix and to not override the bam output.

mkdir -p mapped_STAR
for i in trimmed_fastq2/*.trimmed.fastq
do
    base=$(basename ${i} .trimmed.fastq)
    STAR \
      --genomeDir genome_map/baboon_index \
      --runThreadN 16 \
      --readFilesIn ${i} \
      --outFileNamePrefix mapped_STAR/${base} \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped None \
      --outFilterMismatchNmax 3 \
      --outFilterMultimapNmax 1 \
      --outSAMattributes All

        samtools index mapped_STAR/${base}Aligned.sortedByCoord.out.bam
done

if you have paired reads, these specific lines could change:

for i in trimmed_fastq2/*READ1.trimmed.fastq
base=$(basename ${i} READ1.trimmed.fastq)
--readFilesIn ${i} ${i/READ1/READ2} \

Final note, I don't think there is something fundamentally wrong with using echo and cut to get the basename, but it's not my preference because, for example, sometimes I might have a filename with the "." within the base part and not realize it...