Bowtie2 alignment problems
1
0
Entering edit mode
2.3 years ago
Chris K ▴ 10

Dear all,

I am new to using bowtie2 and my goal is to do an alignment of a fasta file against the greengenes alignment data base (gg_13_8_99.refalign). I have indexed my db with the bowtie2-build gg_13_8_99.refalign greengenes_reference but when i go though the alignment process (bowtie2 -f -x greengenes_reference -U 035.good.fasta -S test3.sam) there is no alignment report, just the message :

" readU: Success (ERR): bowtie2-align exited with value 1 "

Also, the sam file doesn't seem to look right as you see here: enter image description here

and below there is a screenshot of how my fasta file is: enter image description here

Sorry if this is a trivial question, but i just started and i am trying to figure things out, thanks in advance!

aligment Bowtie2 • 3.0k views
ADD COMMENT
1
Entering edit mode

1) why a screenshot when you can copy+paste the text ? save the planet. 2) I don't see any problem in the sam. You're showing us a SAM header. 3) as far as U can see bowtie wants fastq http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml , your input is a fasta file.

ADD REPLY
1
Entering edit mode

1) I am sorry i am just in a panic state, 2) There are only the headers nothing more and when i go from SAM -> BAM -> fasta the fasta is empty 3) You can use a fasta also as an input that is why i used the -f option also i was provided with a fasta not a fastq

Edit: i tried converting the fasta i was provided to a fastq with "#" as quality scores but it presented me with the same result

ADD REPLY
1
Entering edit mode

3) You can use a fasta also as an input that is why i used the -f option also i was provided with a fasta not a fastq

I see. ok.

ADD REPLY
1
Entering edit mode

Is there a special reason that you want to use bowtie instead of blast?

ADD REPLY
0
Entering edit mode

Yeah, it is for training purposes for a small project, that the first part was to use mothur for the alignment and for classification and the second part to use bowtie2 for the alignment

ADD REPLY
1
Entering edit mode

Your index may be corrupt. Have you checked to make sure it is ok? Check the log for the indexing job.

ADD REPLY
0
Entering edit mode

I just downloaded it from the https://mothur.org/wiki/greengenes-formatted_databases/ site (greengenes reference alignment) but i could not find how to access the logfiles to check it

ADD REPLY
0
Entering edit mode

I was asking about bowtie2-build step where you built the indexes from the fasta reference. Did that step complete without errors?

ADD REPLY
0
Entering edit mode

Oh i am sorry, as far as i know yes. This was the output of the bowtie2-build command:

Settings:
Output files: "reference.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  gg_13_8_99.refalign
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:23
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:31
bmax according to bmaxDivN setting: 65001412
Using parameters --bmax 48751059 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 48751059 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:16
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:03
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:03
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.71437e+07 (target: 48751058)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (48751059) for bucket 1
  Calculating Z arrays for bucket 1
  Entering block accumulator loop for bucket 1:
  bucket 1: 10%
  bucket 1: 20%
  bucket 1: 30%
  bucket 1: 40%
  bucket 1: 50%
  bucket 1: 60%
  bucket 1: 70%
  bucket 1: 80%
  bucket 1: 90%
  bucket 1: 100%
  Sorting block of length 45938391 for bucket 1
  (Using difference cover)
  Sorting block time: 00:01:27
Returning block of 45938392 for bucket 1
Getting block 2 of 7
  Reserving size (48751059) for bucket 2
  Calculating Z arrays for bucket 2
  Entering block accumulator loop for bucket 2:
  bucket 2: 10%
  bucket 2: 20%
  bucket 2: 30%
  bucket 2: 40%
  bucket 2: 50%
  bucket 2: 60%
  bucket 2: 70%
  bucket 2: 80%
  bucket 2: 90%
  bucket 2: 100%
  Sorting block of length 48538380 for bucket 2
  (Using difference cover)
  Sorting block time: 00:01:26
Returning block of 48538381 for bucket 2
Getting block 3 of 7
  Reserving size (48751059) for bucket 3
  Calculating Z arrays for bucket 3
  Entering block accumulator loop for bucket 3:
  bucket 3: 10%
  bucket 3: 20%
  bucket 3: 30%
  bucket 3: 40%
  bucket 3: 50%
  bucket 3: 60%
  bucket 3: 70%
  bucket 3: 80%
  bucket 3: 90%
  bucket 3: 100%
  Sorting block of length 38136778 for bucket 3
  (Using difference cover)
  Sorting block time: 00:01:04
Returning block of 38136779 for bucket 3
Getting block 4 of 7
  Reserving size (48751059) for bucket 4
  Calculating Z arrays for bucket 4
  Entering block accumulator loop for bucket 4:
  bucket 4: 10%
  bucket 4: 20%
  bucket 4: 30%
  bucket 4: 40%
  bucket 4: 50%
  bucket 4: 60%
  bucket 4: 70%
  bucket 4: 80%
  bucket 4: 90%
  bucket 4: 100%
  Sorting block of length 44141929 for bucket 4
  (Using difference cover)
  Sorting block time: 00:01:15
Returning block of 44141930 for bucket 4
Getting block 5 of 7
  Reserving size (48751059) for bucket 5
  Calculating Z arrays for bucket 5
  Entering block accumulator loop for bucket 5:
  bucket 5: 10%
  bucket 5: 20%
  bucket 5: 30%
  bucket 5: 40%
  bucket 5: 50%
  bucket 5: 60%
  bucket 5: 70%
  bucket 5: 80%
  bucket 5: 90%
  bucket 5: 100%
  Sorting block of length 21556873 for bucket 5
  (Using difference cover)
  Sorting block time: 00:00:35
Returning block of 21556874 for bucket 5
Getting block 6 of 7
  Reserving size (48751059) for bucket 6
  Calculating Z arrays for bucket 6
  Entering block accumulator loop for bucket 6:
  bucket 6: 10%
  bucket 6: 20%
  bucket 6: 30%
  bucket 6: 40%
  bucket 6: 50%
  bucket 6: 60%
  bucket 6: 70%
  bucket 6: 80%
  bucket 6: 90%
  bucket 6: 100%
  Sorting block of length 32970562 for bucket 6
  (Using difference cover)
  Sorting block time: 00:00:58
Returning block of 32970563 for bucket 6
Getting block 7 of 7
  Reserving size (48751059) for bucket 7
  Calculating Z arrays for bucket 7
  Entering block accumulator loop for bucket 7:
  bucket 7: 10%
  bucket 7: 20%
  bucket 7: 30%
  bucket 7: 40%
  bucket 7: 50%
  bucket 7: 60%
  bucket 7: 70%
  bucket 7: 80%
  bucket 7: 90%
  bucket 7: 100%
  Sorting block of length 28722732 for bucket 7
  (Using difference cover)
  Sorting block time: 00:00:50
Returning block of 28722733 for bucket 7
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 65359109
fchr[G]: 126098847
fchr[T]: 209085341
fchr[$]: 260005651
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 1846708712 bytes to primary EBWT file: reference.1.bt2
Wrote 65001420 bytes to secondary EBWT file: reference.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 260005651
    bwtLen: 260005652
    sz: 65001413
    bwtSz: 65001413
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 16250354
    offsSz: 65001416
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 1354197
    numLines: 1354197
    ebwtTotLen: 86668608
    ebwtTotSz: 86668608
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:10:01
Reading reference sizes
  Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to join reference sequences: 00:00:16
ADD REPLY
1
Entering edit mode

That looks ok. Can you put a couple of test sequences in a fasta file and see if you are able to align those? If that works then the issue may be with your input query fasta. Did you move that file from a windows/macOS machine to Linux?

ADD REPLY
0
Entering edit mode

I just run the example that is provided inside the bowtie2 package with the lambda virus and it seems to work providing the right results. I was sent that file via email guessing from a Windows OS and i have both tried to use the original fasta file and a treated one with mothur ( I just screened it for maxlength etc) with the same results.

ADD REPLY
0
Entering edit mode

Also using the same indexes from the bowtie package, it seems that the alignment works. S0, maybe it is the indexes that cause the problem?

bowtie2 -f -x lambda_virus -U 035.good.fasta -S test2.sam
5000 reads; of these:
  5000 (100.00%) were unpaired; of these:
    5000 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
ADD REPLY
1
Entering edit mode

Can you do dos2unix your.fa and see if that fixes the line endings and makes the file work with your indexes. Otherwise you may want to try and recreate the index files.

ADD REPLY
0
Entering edit mode

That did not seem to work after the format change. I have recreated them many times but still gives the same result, is there a chance that the orignal file is not compatible?

ADD REPLY
2
Entering edit mode
2.3 years ago
GenoMax 147k

Did you use the fasta format file gg_13_8_99.fasta from Greengenes that came from https://mothur.org/w/images/6/68/Gg_13_8_99.taxonomy.tgz ? That is the reference you should use.

Looking at your original post you used the aligned fasta format file as input (which is a different format that has gaps in fasta files). You can't use bowtie to align against this format.

ADD COMMENT
0
Entering edit mode

No as reference i used the gg_13_8_99.refalign which is an aligned fasta. Should i use the unaligned one? I used the aligned one because in mothur i could not use the simple fasta format for the alignment.

ADD REPLY
1
Entering edit mode

That is your problem. You should use the file I linked above to get the plain fasta format reference. Build bowtie2 indexes using gg_13_8_99.fasta file and your alignments will work.

ADD REPLY
0
Entering edit mode

Oooh i see, thank you very much for your time and the effort you put into helping me!!

ADD REPLY
1
Entering edit mode

Once you confirm that this works I will move my comment above to an answer. You can then accept it to provide closure to this thread.

ADD REPLY
0
Entering edit mode

Yes, i am creating the new indexes as we speak

ADD REPLY
0
Entering edit mode

The alignment was just done. Once again, thank you very much, you can move your answer so i can close the subject.

ADD REPLY

Login before adding your answer.

Traffic: 1622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6