Question

Reference genome, BWA and right algorithm

0

Entering edit mode

15 months ago

juliobinf14 • 0

Hello

I'm using BWA to create the index for aligning some rna-seq fastq.

First thing I did was download hg38.fa.align.gz from UCSC

Then I:

gzip -d hg38.fa.align.gz

sudo apt-get install bwa

Here comes the problem. BWA instructions reccomend bwtsw algorithm, but when I use it:

bwa index -p ref_hum -a bwtsw hg38.fa.align
          [bwa_index] Pack FASTA... 7.36 sec
          [bwa_index] Construct BWT for the packed sequence... 
          Floating point exception (core dumped)

When I don't specify the algorithm

bwa index -p ref_hum hg38.fa.align
          [bwa_index] Pack FASTA... 7.42 sec
          [bwa_index] Construct BWT for the packed sequence...
          [bwa_index] 0.00 seconds elapse.
          [bwa_index] Update BWT... 0.00 sec
          [bwa_index] Pack forward-only FASTA... 7.33 sec
          [bwa_index] Construct SA from BWT and Occ... 0.00 sec
          [main] Version: 0.7.17-r1188
          [main] CMD: bwa index -p ref_hum hg38.fa.align
          [main] Real time: 14.780 sec; CPU: 14.747 sec

I'm worried I might be losing information since bwa instructions are :

bwa index [-p prefix] [-a algoType] <in.db.fasta> 

-a STR  Algorithm for constructing BWT index. Available options are:
is  IS linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is the size of the database. IS is moderately fast, but does not work with database larger than 2GB. IS is the default algorithm due to its simplicity. The current codes for IS algorithm are reimplemented by Yuta Mori.
bwtsw   Algorithm implemented in BWT-SW. This method works with the whole human genome

Thanks for the help

rna-seq bwa • 1.4k views

ADD COMMENT • link updated 15 months ago by DareDevil ★ 4.4k • written 15 months ago by juliobinf14 • 0

0

Entering edit mode

I might be using "is" instead of "bwtsw". I don't feel this is a solution

ADD REPLY • link 15 months ago by juliobinf14 • 0

0

Entering edit mode

I changed the genome hg38.fa.align.gz to hg38.fa.gz and it worked

ADD REPLY • link 15 months ago by juliobinf14 • 0

0

Entering edit mode

Bwa is not for rnaseq. Don't ignore that.

ADD REPLY • link 15 months ago by ATpoint 87k

score 2 · Answer 1 · 2024-01-01

For RNASeq analysis it's recommended to use HISAT or STAR

BWA: BWA is primarily designed for genomic DNA alignment. BWA is widely used for DNA alignment.

HISAT: HISAT is specifically designed for spliced alignment, more suitable for RNA-seq data where exons and introns need to be considered.

Note: Not generated by chatGPT. My personal opinion

score 0 · Answer 2 · 2024-01-01

0

Entering edit mode

15 months ago

ATpoint 87k

It's fine to set no algorithm, the default is fine. hg38 is arguably the most indexed genome in the world, almost everyone uses the default.

If this was normal DNA-seq it would be fine but RNA-seq needs a splice-aware aligner which bwa is not.

Use STAR for the alignment.

ADD COMMENT • link 15 months ago by ATpoint 87k