Entering edit mode
9.8 years ago
HG
★
1.2k
I am analyzing a Nextseq run from bacterial data set as like few post earlier here I also found straight line of G's at the end of the read, can anyone suggest me how to overcome this problem?? any script or tool.
Thanks for your reply. For second part of your comment: could you please suggest how I can do such a job any script? Because I have 4 paired-end reads for each sample.
What I do is run fastqc and then test if poly-G is one of the over-represented sequences (and for what extent).
Then,actually, cutadapt with poly-G as adapter will remove the read but you should give it both mates as input (I think it will remove both of them but I'm not sure)
I checked with FastQC as you suggested, in my data set there is no over-represented sequence and mean quality score is 35. So I hope without any processing the data set I can directly run assembly. What do you think? I used Spades for assembly which also have some error correction steps in ion-hammer.
Sounds good, I can only dream of getting such numbers. Did you run both files (R1 and R2)?
Yes I did. I assembled also my data set with a good output N50 value number of contig