Bowtie2 aligner
1
0
Entering edit mode
3.3 years ago
priya.bmg ▴ 60

Hello

I am trying to learn Bowtie and I couldn't figure out the error.

First, I downloaded GRCH38 build from NCBI assembly (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39) and using the command bowtie2-build, built indexes of the reference sequence. Got four index bt2 files Then downloaded fasta sequence from NCBI SRA (http://ftp.sra.ebi.ac.uk/vol1/run/ERR235/ERR2356727/24_1.fastq.gz) (http://ftp.sra.ebi.ac.uk/vol1/run/ERR235/ERR2356727/24_2.fastq.gz)

Then aligned with reference sequence using the command: bowtie2 -x GCF_000001405.26_GRCh38_genomic.fna.fna -U 24_1.fastq,24_2.fastq -S eg1.sam

I get the following error: "GCF_000001405.26_GRCh38_genomic.fna.fna" does not exist or is not a Bowtie 2 index

I used the bowtie2 commands to align the sequence but still getting an error. Can someone help with the error here? Is there a way to make sure that I have Bowtie 2 index files

Thanks

Priya

Bowtie2 alignment • 2.6k views
ADD COMMENT
0
Entering edit mode

Can you provide the command you used to build your bowtie2 indexes and also provide a listing of ls -lh GCF*. Are you sure the indexes were properly made without any errors at the end of that process?

ADD REPLY
0
Entering edit mode

Blockquote

bowtie2-build GCF_000001405.26_GRCh38_genomic.fna bt2_index ## Build indexes

Blockquote

ls -lh GCT*

>  rw-r--r-- 1 thirun0000 cichon 3.1G Jul 27 10:06
> -rw-r--r-- 1 thirun0000 cichon  16K Jul 27 10:09 GCF_000001405.26_GRCh38_genomic.fna.index.1.bt2
> -rw-r--r-- 1 thirun0000 cichon    0 Jul 27 10:08 GCF_000001405.26_GRCh38_genomic.fna.index.2.bt2
> -rw-r--r-- 1 thirun0000 cichon  15K Jul 27 10:08 GCF_000001405.26_GRCh38_genomic.fna.index.3.bt2
> -rw-r--r-- 1 thirun0000 cichon 728M Jul 27 10:08 GCF_000001405.26_GRCh38_genomic.fna.index.4.bt2
ADD REPLY
0
Entering edit mode

Possible multiple issues. Do you have a file with no readable name that is 3.1G? It looks like your bowtie-build command may have failed part-way. You should capture the stdout/stderr to capture those messages. You are missing rev files as you have already discovered.

bowtie2-build GCF_000001405.26_GRCh38_genomic.fna  index_base_name (use a name you like) > log.out 2>log.error 

Then look at the log files produced.

ADD REPLY
0
Entering edit mode

The 3.1 GB is the reference file downloaded from NCBI assembly

log.error

Building a SMALL index

log.out

Settings:
  Output files: "index_base_name_alignment.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  GCF_000001405.26_GRCh38_genomic.fna
Reading reference sizes
  Time reading reference sizes: 00:00:24
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
bmax according to bmaxDivN setting: 762328945
Using parameters --bmax 571746709 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 571746709 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:29
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:22
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:46
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.04932e+09 (target: 571746708)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
ADD REPLY
0
Entering edit mode

Is that all you get? At least in this part there is no error. Perhaps the process is not yet complete? Assuming you re-ran the command just now it should not take < 5 min to complete the indexing.

ADD REPLY
0
Entering edit mode

Yes. Is there a way to get pre-built indexes for bowtie2?

ADD REPLY
1
Entering edit mode

You can get pre-built indexes from Bowtie SF page. Look in the right column as you scroll down.

ADD REPLY
0
Entering edit mode

I have downloaded fastq files of the reference sequence which created a problem in building indexes. fasta files have to used for building indexes (http://seqanswers.com/forums/showthread.php?t=14673). I downloaded prebuilt indexes for GRCh38 from ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/grch38_1kgmaj_bt2.zip and I had no issues with alignment. Thank you everyone for their suggestions, it was very helpful

ADD REPLY
1
Entering edit mode
3.3 years ago
Jianyu ▴ 580

The prefix of index should be "GCF_000001405.26_GRCh38_genomic.fna.index" instead of "GCF_000001405.26_GRCh38_genomic.fna.fna"

ADD COMMENT
0
Entering edit mode

Now I get a different error:

 > Could not open index file GCF_000001405.26_GRCh38_genomic.fna.index.rev.1.bt2
 > Could not open index file GCF_000001405.26_GRCh38_genomic.fna.index.rev.2.bt2 
    (ERR): bowtie2-align died with signal 11 (SEGV)

I have no reverse index files

ADD REPLY
0
Entering edit mode

Could you show us the log information of your "bowtie2-build" command? It seems "GCF_000001405.26_GRCh38_genomic.fna.index.2.bt2" is empty, which suggests bowtie2-build might failed

ADD REPLY
0
Entering edit mode

Please find the log file posted above. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6