Trimming out primer sequences in the middle of reads
2
0
Entering edit mode
7.8 years ago

Hi!

I have PacBio reads that need to be assembled. These reads have Illumina primers at the both ends as well as in the middle. The problem is that the primer sequences vary and standard trimming cannot remove all the primers in the reads. My lab wants the assembled genome with the best quality, so I might have to write a script to detect the primers in the middle. I am currently thinking that I might want to remove sequences that are 80 ~ 100% similar to the primer sequences. But I am worried that this would also get rid of some informative sequences of the genome.

How do you guys deal with such situations?

Thank you in advance!

genome • 5.0k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

I wrote a tool for removing internal PacBio adapter sequences, in the BBMap package:

removesmartbell in=reads.fq out=clean.fq split=t adapter=ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT

By default it uses the standard PacBio SmartBell adapters, but you can specify an Illumina adapter in this case. It uses indel-aware alignment designed to model PacBio's error rates of indels and substitutions, and has a very low false-positive rate. I don't remember the exact rate but I think it was around 1 in 5 megabases of PacBio sequence, or something like that. So it should not cause any problems downstream.

ADD COMMENT
0
Entering edit mode

Hello, does your script also remove the reverse complement? Do I find it within the BBmap scripts?

ADD REPLY
1
Entering edit mode

You can include the RC sequence in adapter file or command line above.

removesmartbell in=reads.fq out=clean.fq split=t adapter=ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT,RC_Sequence
ADD REPLY
1
Entering edit mode
7.8 years ago

I don't have direct experience with the situation you describe but cutadapt is very flexible in how you want to detect, remove or mask one or more adapters. See for example the paragraph https://cutadapt.readthedocs.io/en/stable/guide.html#multiple-adapter-occurrences-within-a-single-read

If the adapter sequence you give in input is long enough, say > 15 nt, it's unlikely you will throw away informative sequence (roughly speaking, of course).

ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6