Align Illumina short reads to a subset of human reference genome
1
0
Entering edit mode
2.3 years ago
francisco • 0

Hi everyone.

I have paired end sequencing data (Illumina) and there are specific regions in the genome I am interested in. I aligned the samples using bwa to the hg38 fasta and it took 19h to align and generate the SAM file. I wanted to speed up this process so I filtered the hg38 fasta file to only contain positions 5000 base pairs to the left or right of my regions of interested. When aligning to the new hg38_small.fasta it took almost 30h.

I was wondering if anyone has any tip or knows about a better approach for doing this?

Thanks, Francisco

align human fasta • 1.9k views
ADD COMMENT
1
Entering edit mode

I wanted to speed up this process so I filtered the hg38 fasta file to only contain positions 5000 base pairs to the left or right of my regions of interested.

please, don't. Exome Sequencing: Masking The Non-Genic Sequences ?

ADD REPLY
2
Entering edit mode
2.3 years ago
Thomas ▴ 160

Generally aligning to heavily masked genomes like this is not a good idea as you are at risk of generating spurious alignments

How long are your reads? What exact bwa commands are you using? Depending on your read length, it may be a better idea to use bwa-mem

Using the wrong bwa tools for your read length will produce long run times

ADD COMMENT
0
Entering edit mode

Hi Thomas, thanks for the quick response.

Reads have between 70-150bp of length

I was using the following command: $bwa mem -t $threads $ref_fasta ${filename}_1.fastq.gz ${filename}_2.fastq.gz > ${filename}.sam

ADD REPLY
0
Entering edit mode

Hmm, I am not sure what the issue is here exactly, as a standard alignment process like this shouldn't take 19 hours

What are the size of your fastq files?

ADD REPLY
0
Entering edit mode

The size of the fastq is 70Gb

ADD REPLY
2
Entering edit mode

split your fastq in smaller parts, run bwa + sort in parallel, merge each sorted bam

ADD REPLY
0
Entering edit mode

Hi Pierre, thanks for the suggestion. Will have a go at it! Thanks, Francisco

ADD REPLY

Login before adding your answer.

Traffic: 2102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6