Entering edit mode
6.7 years ago
oars
▴
200
I've tried making a "fai" file using the below command:
samtools faidx genome.fa
But I get an error:
[E::fai_build_core] Different line length in sequence '(null)'
Could not build fai index /media/dcpattie/backup/Week08/genome.fa.fai
I found this old thread: Error while doing indexing of fasta file using SAMTOOL faidx
Is the best course of action the following sequence of steps?
java -jar picard.jar NormalizeFasta \
I=input_genome.fa \
O=normalized_genome.fa
if the following command shows different length of lines.
then you'll have to reformat your fasta sequence with NormalizeFasta
First - many thanks for your reply!
now, the results...
It seems your
>chr1
header is corrupted, here is what I get:So the genome is formatted with 50 bases per line, with variations in the last line for each chromosome - the differences between my results and yours for these last lines is probably due to different versions of the genome.
What is the output of
head -n2 genome.fa
?So why the output of
awk '/^>/ {print;next;} {print length($0);}' genome.fa | uniq
is:It should be:
oars : As an aside, if your
genome.fa
is corrupt then your bwa indexes are no good either.I don't know what went wrong, here is my commands:
Are there any blank lines in your genome.fa file?
I'm not sure, its a 3GB file. I pulled the file from the UCSC website, then after making my genome.fa output file I used...
...to create five additional files, all named with the prefix genome.fa (.sa; .ann; .amb; .pac; .bwt).
Now I'm trying to make a genome.fa.fai file but this is much harder than advertised.