Barley, alignment rates
1
0
Entering edit mode
24 days ago
IrK ▴ 100

Hi everyone,

I'm working with single-end reads (150 bp) from barley (Hordeum vulgare) and aligning them against the Morex V3 reference genome. I have a total of 600,490,430 reads, and here are my alignment statistics:

16.00% of reads did not align at all.
17.79% of reads aligned exactly once.
66.21% of reads aligned more than once.- 
Overall alignment rate: 84.00%.

Is this alignment rate considered normal for barley?

My interpretation is that the 66% of reads aligning more than once is kind of expected, given that barley has a complex genome with many repetitive sequences.

Any insights or suggestions would be greatly appreciated!

Thank you

barley alignment-rate • 319 views
ADD COMMENT
2
Entering edit mode
24 days ago
dthorbur ★ 2.8k

Your overall alignment rate isn't bad, the multi-mapped rate is a little lower than I would expect for Barley, but there is a lot of detail missing to make a better judgement. That said, I've also not used single end reads with a plant genome before, so I can only offer suggestions based on my experience mapping paired-end reads.

Some considerations:

  1. What was the overall quality of the reads after trimming and cleaning?
  2. Single-end reads perform more poorly when mapping to complex genomes than paired-end reads. Single-end reads can be more ambiguous when more than one mapping location is possible since the other paired read might have helped solve that mapping.
  3. If there is a big difference between cultivars you sampled and the reference genome you will have higher error rate and the reference may be missing entire regions or SVs from your sampled cultivar's genome.
  4. The unmapped rate is a little high, but nothing to worry about IMO. But, it might be worth investigating. Options include mapping reads to another more closely associated reference and comparing, taking a handful of the unmapped reads and seeing what they map to with BLASTN, checking the quality of these reads specifically.
ADD COMMENT
0
Entering edit mode

Thank you for your response and suggestions.

"1. What was the overall quality of the reads after trimming and cleaning?"

  • According to the QC reports, there were no adapter sequences detected, so no trimming was necessary.
  • The "Per base sequence quality" is excellent with no red flags.
  • The GC content is 48%, which is acceptable for barley.
  • There were no sequences flagged as poor quality.

"2. Single-end reads perform more poorly ..." - I agree that single-end reads generally offer poorer alignment rate compared to paired-end reads. I also preparing data for pseudo alignment, would like to see if it will make a difference.

"3. If there is a big difference between cultivars you sampled...." - This also make sense

"4. The unmapped rate is a little high" - that is a great idea. I'll' retrieve unmapped reads and see where they map

thank you so much

ADD REPLY
0
Entering edit mode

Just a follow-up on this: Given that the barley genome contains many repetitive sequences and 66.21% of reads aligned more than once, do you need to handle these multi-mappers in any specific way? Is there a standard procedure for removing them, or should they just be kept as is? Just wonder if there are any additional analysis steps for data with many repetitive elements and a high percentage of multi-mapped reads. thanks

ADD REPLY

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6