Trim read-specific adapter in paired-end reads

1

Entering edit mode

7.0 years ago

Nicolas Rosewick 11k

Hi,

I've a specific enriched DNA-seq library to analyze ( 2x76 bp sequenced on a NextSeq500).

The library is defined as :

R1                                                  R2
==============>-----------------<===========#####@@@@@

=== : DNA fragment (should correctly align to the genome)
### : barcode
@@@ : some random sequence we introduce to increase the library complexity

Important things to know :

barcode and the random sequence have always the same length (12 and 14 respectivelly)
Each pair of reads have different barcode (only PCR duplicates should have same barcode and read sequences)

My goal is to remove the barcode and the random sequence from R2 but also from R1 as R1 and R2 could overlap if the DNA fragment to sequence is small (less than 2x76 = 152 bp).

Example of R1 and R2 overlapping. In this case R1 contains sequence from the barcode

R1 =====================>
                    ||||
R2      <===========#####@@@@@

Is there some tool to handle such cases. My first idea would be to write some R script to extract the barcode and random sequence and to align them against R1 in a local manner..

adapter trim • 2.2k views

ADD COMMENT • link 7.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Not what you are asking for, but chances are that you don't actually have to remove this and can just align it, and it will get soft-clipped.

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes I know but it would be nice to have clean reads for further analysis ;)

ADD REPLY • link 7.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

I think you can use cutadapt, if I'm not mistaken it'll remove the #### and following nts from R1

ADD REPLY • link 7.0 years ago by Asaf 10k

0

Entering edit mode

yes but in this case each read will have a different adapter to trim.

ADD REPLY • link 7.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

You can give only the #### sequence as an input to cutadapt and allow it to be anywhere along the sequence and request only the following sequence.

ADD REPLY • link 7.0 years ago by Asaf 10k

0

Entering edit mode

yes but each read will have a different #### sequence .

ADD REPLY • link 7.0 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Oh, I skipped this part when first reading :). Good chances you'll end up coding it.

ADD REPLY • link 7.0 years ago by Asaf 10k

Login before adding your answer.