I have aligned DNA-seq data to the following reference genome: https://www.ncbi.nlm.nih.gov/nuccore/U00096 I keep getting the following output in my error file when I try to run the bcftools mpileup command for variant calling:
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
[E::faidx_adjust_position] The sequence "lcl|" was not found
I have these same error lines for tons of rows. I know in my .fa file which I downloaded I have the following first line
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
Do I have a missing string here or something at the beginning? I did not touch the file when I downloaded it so I just assumed the alignment would work. This is the command I am using to align:
bwa mem -t 180 ecoli R1_001.fastq R2_001.fastq > sample.sam
This is the command I am using to find the variants:
bcftools mpileup -d 500 -Ou -f GCF_000005845.2_ASM584v2_genomic.fa sample.bam | bcftools call -mv -Ob -o sample_calls.bcf
which I took directly from the Github page.
"ecoli" is the index I built using the .fa and .gtf files from the same website. Any help appreciated here! Important note: I have not included my file paths here for privacy reasons.
Just to be sure you downloaded
fasta
format of the genome sequence correct? Do you just see only one line when you dogrep "^>" your_fasta_file
or additional lines?