Reference genome, BWA and right algorithm
2
0
Entering edit mode
11 months ago

Hello

I'm using BWA to create the index for aligning some rna-seq fastq.

First thing I did was download hg38.fa.align.gz from UCSC

Then I:

gzip -d hg38.fa.align.gz

sudo apt-get install bwa

Here comes the problem. BWA instructions reccomend bwtsw algorithm, but when I use it:

bwa index -p ref_hum -a bwtsw hg38.fa.align
          [bwa_index] Pack FASTA... 7.36 sec
          [bwa_index] Construct BWT for the packed sequence... 
          Floating point exception (core dumped)

When I don't specify the algorithm

bwa index -p ref_hum hg38.fa.align
          [bwa_index] Pack FASTA... 7.42 sec
          [bwa_index] Construct BWT for the packed sequence...
          [bwa_index] 0.00 seconds elapse.
          [bwa_index] Update BWT... 0.00 sec
          [bwa_index] Pack forward-only FASTA... 7.33 sec
          [bwa_index] Construct SA from BWT and Occ... 0.00 sec
          [main] Version: 0.7.17-r1188
          [main] CMD: bwa index -p ref_hum hg38.fa.align
          [main] Real time: 14.780 sec; CPU: 14.747 sec

I'm worried I might be losing information since bwa instructions are :

bwa index [-p prefix] [-a algoType] <in.db.fasta> 

-a STR  Algorithm for constructing BWT index. Available options are:
is  IS linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is the size of the database. IS is moderately fast, but does not work with database larger than 2GB. IS is the default algorithm due to its simplicity. The current codes for IS algorithm are reimplemented by Yuta Mori.
bwtsw   Algorithm implemented in BWT-SW. This method works with the whole human genome

Thanks for the help

rna-seq bwa • 1.1k views
ADD COMMENT
0
Entering edit mode

I might be using "is" instead of "bwtsw". I don't feel this is a solution

ADD REPLY
0
Entering edit mode

I changed the genome hg38.fa.align.gz to hg38.fa.gz and it worked

ADD REPLY
0
Entering edit mode

Bwa is not for rnaseq. Don't ignore that.

ADD REPLY
2
Entering edit mode
11 months ago
DareDevil ★ 4.3k

For RNASeq analysis it's recommended to use HISAT or STAR

BWA: BWA is primarily designed for genomic DNA alignment. BWA is widely used for DNA alignment.

HISAT: HISAT is specifically designed for spliced alignment, more suitable for RNA-seq data where exons and introns need to be considered.

Note: Not generated by chatGPT. My personal opinion

ADD COMMENT
0
Entering edit mode
11 months ago
ATpoint 85k

It's fine to set no algorithm, the default is fine. hg38 is arguably the most indexed genome in the world, almost everyone uses the default.

If this was normal DNA-seq it would be fine but RNA-seq needs a splice-aware aligner which bwa is not.

Use STAR for the alignment.

ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6