Why are GC% per base important in quality control of reads?
1
0
Entering edit mode
6.2 years ago
c.clarido ▴ 110

Hello,

In quality control of reads, why do we look at the GC% per base position? I have the following result

Gem. lengtes: 75
Max. lengte: 101
Min. lengte: 24
GC globaal: 32%
GC per base position: 
[32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 31, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 27, 27, 26, 26, 25, 25, 25, 25, 24, 24, 24, 23, 23, 22, 22, 21, 21, 21, 20, 19, 19, 18, 18, 17, 16, 15, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 1, 0, 0]

Looking at the GC per base position, I can see that the GC% per base position decreases. So what can I conclude from this? Thank you in advance!

QC Assembly • 1.9k views
ADD COMMENT
0
Entering edit mode

Looks like it is just biased because of the read length. After 24bp the number of read is decreasing, as the GC%.

ADD REPLY
0
Entering edit mode

I also believe that there is a rule that you should mention that it is about a school assignment. Maybe a moderator can confirm that.

ADD REPLY
0
Entering edit mode
6.2 years ago
gb ★ 2.2k

In your case it can be a poly-A tail or something, did you trim of the primers/adapters and everything after the primers/adapters? Or will this still look like this after quality trimming? Are all the reads the same length?

I think you mostly use GC-content as a quality check if you compare it with the GC-content of a reference. So you expect that a certain species or chromosoom has a certain "specific" GC%. If you expect 40% on chromosoom x and it is 75% something is off.

EDIT:

I just noticed that there is a big difference in shortest and longest read so that plays a roll

ADD COMMENT
0
Entering edit mode

Yeah they look like this after quality trimming, so the lengths varies a lot. So I can assume they could be poly-A tail?

ADD REPLY
0
Entering edit mode

You could scan/trim for polyA and see if the length reduces even further.

ADD REPLY
0
Entering edit mode

No you can not assume they could be poly-A tail. That's something you can see in de sequences. But if it are not poly-A tails it does not mean your data is wrong. You just have this result because the differences in length.

ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6