Question

Low Alignment rate

0

Entering edit mode

14 months ago

mavy ▴ 10

Hello All

I am working on homosapien chipset data I have done the indexing using the bowtie2 tool of the top-level assembly from the ensemble website, https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz

Now while doing the alignment using bowtie2 I am getting a very low overall alignment rate like 35%, 39% for all the samples. Why is the alignment rate so low? Any suggestions on what am I doing wrong or what is causing that ?? and what can I do for better results I am new to bioinformatics so kindly bear if this is a very basic question but any advice/help is appreciated.

Following is the code for indexing

bowtie2-build input.fasta genome_index

for alignment :

bowtie2 -p 12 -q -x genome_index -U my_inputfile -S my_output

Any help would be appreciated.

Thanks Regards Mehvi

alignment low_alignment_rate Bowtie2 • 1.3k views

ADD COMMENT • link updated 14 months ago by dsull ★ 6.9k • written 14 months ago by mavy ▴ 10

2

Entering edit mode

Did you run FASTQC to check the quality of your sequencing samples?

Otherwise, try taking some the reads that failed to align, and run them through NCBI BLAST to see where those reads might be coming from.

ADD REPLY • link 14 months ago by dsull ★ 6.9k

2

Entering edit mode

Yeap, you can sample your reads with seqtk like this:

seqtk sample -s100 read2.fq 100 > Sample.fq

Take a couple and run them through BLAST as dsull suggested, and once you have an idea where they are coming from you can use FASTQScreen to actually quantify what proportion of the reads come from a genome or the other one.

ADD REPLY • link 14 months ago by biofalconch ★ 1.3k

0

Entering edit mode

Thanks for your reply ,I will try this . Yes , I had run fastqc ,the results were fine , let me know if I should focus or check again any results in Fastqc.

ADD REPLY • link 14 months ago by mavy ▴ 10

score 3 · Answer 1 · 2023-09-25

3

Entering edit mode

14 months ago

GenoMax 147k

Looks like you used the repeat-masked fasta file instead of using the normal fasta file. Please use: https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

ADD COMMENT • link 14 months ago by GenoMax 147k

1

Entering edit mode

Thanks alot! This worked out for me https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz , now the alignment rate almost 99%

ADD REPLY • link 14 months ago by mavy ▴ 10

0

Entering edit mode

Wow! Didn't realize the repeat-masked file could have such a large influence on alignment. Very good to know -- I guess I'll need to dive into that FASTA file myself to see why. Thanks GenoMax! :)

ADD REPLY • link 14 months ago by dsull ★ 6.9k