Low Alignment rate
1
0
Entering edit mode
14 months ago
mavy ▴ 10

Hello All

I am working on homosapien chipset data I have done the indexing using the bowtie2 tool of the top-level assembly from the ensemble website, https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_rm.toplevel.fa.gz

Now while doing the alignment using bowtie2 I am getting a very low overall alignment rate like 35%, 39% for all the samples. Why is the alignment rate so low? Any suggestions on what am I doing wrong or what is causing that ?? and what can I do for better results I am new to bioinformatics so kindly bear if this is a very basic question but any advice/help is appreciated.

Following is the code for indexing

bowtie2-build input.fasta genome_index

for alignment :

bowtie2 -p 12 -q -x genome_index -U my_inputfile -S my_output

Any help would be appreciated.

Thanks Regards Mehvi

alignment low_alignment_rate Bowtie2 • 1.3k views
ADD COMMENT
2
Entering edit mode

Did you run FASTQC to check the quality of your sequencing samples?

Otherwise, try taking some the reads that failed to align, and run them through NCBI BLAST to see where those reads might be coming from.

ADD REPLY
2
Entering edit mode

Yeap, you can sample your reads with seqtk like this:

seqtk sample -s100 read2.fq 100 > Sample.fq

Take a couple and run them through BLAST as dsull suggested, and once you have an idea where they are coming from you can use FASTQScreen to actually quantify what proportion of the reads come from a genome or the other one.

ADD REPLY
0
Entering edit mode

Thanks for your reply ,I will try this . Yes , I had run fastqc ,the results were fine , let me know if I should focus or check again any results in Fastqc.

ADD REPLY
3
Entering edit mode
14 months ago
GenoMax 147k

Looks like you used the repeat-masked fasta file instead of using the normal fasta file. Please use: https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

ADD COMMENT
1
Entering edit mode

Thanks alot! This worked out for me https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz , now the alignment rate almost 99%

ADD REPLY
0
Entering edit mode

Wow! Didn't realize the repeat-masked file could have such a large influence on alignment. Very good to know -- I guess I'll need to dive into that FASTA file myself to see why. Thanks GenoMax! :)

ADD REPLY

Login before adding your answer.

Traffic: 1554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6