How to remove poly G in Nextseq data
1
1
Entering edit mode
9.8 years ago
HG ★ 1.2k

I am analyzing a Nextseq run from bacterial data set as like few post earlier here I also found straight line of G's at the end of the read, can anyone suggest me how to overcome this problem?? any script or tool.

assembly nextseq trimming • 7.2k views
ADD COMMENT
0
Entering edit mode
9.8 years ago
Asaf 10k

I use cutadapt with a poly-G as adapter, you should allow some errors because the poly-G sometimes combine an occasional A-C-T base.

When I analyze paired-end, the second mate is sometimes a poly-G and they I remove it by testing if the read has more than 80% G's. If that's the case I disregard the entire read (or use it as single-end, depends on what I do with it later).

ADD COMMENT
0
Entering edit mode

Thanks for your reply. For second part of your comment: could you please suggest how I can do such a job any script? Because I have 4 paired-end reads for each sample.

ADD REPLY
0
Entering edit mode

What I do is run fastqc and then test if poly-G is one of the over-represented sequences (and for what extent).

Then,actually, cutadapt with poly-G as adapter will remove the read but you should give it both mates as input (I think it will remove both of them but I'm not sure)

ADD REPLY
0
Entering edit mode

I checked with FastQC as you suggested, in my data set there is no over-represented sequence and mean quality score is 35. So I hope without any processing the data set I can directly run assembly. What do you think? I used Spades for assembly which also have some error correction steps in ion-hammer.

ADD REPLY
0
Entering edit mode

Sounds good, I can only dream of getting such numbers. Did you run both files (R1 and R2)?

ADD REPLY
0
Entering edit mode

Yes I did. I assembled also my data set with a good output N50 value number of contig

ADD REPLY

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6