bam is in a format that cannot be usefully indexed
1
0
Entering edit mode
4 months ago
Kelly ▴ 10

Hi everyone, I am new to coding and am attempting to create .BAM files from my fastq trims, but it is giving me the error "bam is in a format that cannot be usefully indexed", as well as saying that "samtools sort: failed to read header from "-", but I do not understand what this is referencing to. The code I am using is a script I copied from my research partner when we did this same coding practice on another sample of reads, and the code worked perfectly fine for those reads. The code is:

module load star
module load samtools
cd $HOME/baboons/reads






mkdir -p mapped_STAR
for i in `ls trimmed_fastq2/*.trimmed.fastq | cut -f2 -d'/'`
do
     base=`echo ${i} | cut -f1 -d'.'`
    STAR --genomeDir genome_map/baboon_index --runThreadN 16 --readFilesIn trimmed_fastq2/*.trimmed.fastq --outFileNamePrefix mapped_STAR/${base} --outSAMtype BAM SortedByCoordinate --outSAMunmapped None --outFilterMismatchNmax 3 --outFilterMultimapNmax 1 --outSAMattributes All | samtools sort -@ 8 -o mapped_STAR/${base}Aligned.sortedByCoord.out.bam > ${i}_log
        samtools index mapped_STAR/${base}Aligned.sortedByCoord.out.bam
done

If anyone could give me advice on this, I would really appreciate it. A clue that I am thinking about is that when I ran the code to index the genome, I only received one SA file, however on our other practice, I got an SA file for each NCBI read. Could that be why my mapping isn't working? Thank you!!!!

mapping bam index star • 704 views
ADD COMMENT
0
Entering edit mode

I don't think you are making a bam with that STAR command. I don't think STAR tolerates * in file names, so spell them out.

ADD REPLY
0
Entering edit mode
4 months ago
rfran010 ★ 1.3k

STAR will tolerate the '*' in the filenames command as this gets expanded by the command line. However, this may cause issues depending on what list it expands to. This argument should only take one sample which could be two files if paired. --readFilesIn /path/to/read1 [/path/to/read2]

Also, to my knowledge, STAR does not output to stdout like other aligners. Instead it writes directly to mapped_STAR/${base}Aligned.sortedByCoord.out.bam So you are piping irrelevant info into samtools sort which is overriding the output file.

Also, with the --outSAMtype BAM SortedByCoordinate option, you don't need samtools sort.

Here's a possible correction. I formatted according to my preference as well, but the important bit is to correct the input to outFileNamePrefix and to not override the bam output.

mkdir -p mapped_STAR
for i in trimmed_fastq2/*.trimmed.fastq
do
    base=$(basename ${i} .trimmed.fastq)
    STAR \
      --genomeDir genome_map/baboon_index \
      --runThreadN 16 \
      --readFilesIn ${i} \
      --outFileNamePrefix mapped_STAR/${base} \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped None \
      --outFilterMismatchNmax 3 \
      --outFilterMultimapNmax 1 \
      --outSAMattributes All

        samtools index mapped_STAR/${base}Aligned.sortedByCoord.out.bam
done

if you have paired reads, these specific lines could change:

for i in trimmed_fastq2/*READ1.trimmed.fastq
base=$(basename ${i} READ1.trimmed.fastq)
--readFilesIn ${i} ${i/READ1/READ2} \

Final note, I don't think there is something fundamentally wrong with using echo and cut to get the basename, but it's not my preference because, for example, sometimes I might have a filename with the "." within the base part and not realize it...

ADD COMMENT
1
Entering edit mode

never mind, I just tried the code again and it worked! thank you x100000 you beautiful coding angel

ADD REPLY
0
Entering edit mode

so I have run the code, and it is only giving me some bam files, while others are coming out as 0 bytes. I got my original code to work for the most part, but one of the bam files was 0 bytes as well. What would cause this?

ADD REPLY
0
Entering edit mode

What would cause this?

More than likely improper parsing of file names as was noted in answer above.

one of the bam files was 0 bytes as well.

If only one sample was affected then simply run that by hand. including correct sample names.

ADD REPLY
0
Entering edit mode

Hard to say without more information. If there's no error messages, this can happen when you read/write to the same file at the same time.

You can check STAR logs though for error messages as well.

ADD REPLY

Login before adding your answer.

Traffic: 2093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6