With cutadapt, is it possible to completely trim reads to empty reads instead of removing them?
0
0
Entering edit mode
16 months ago
j.gleixner ▴ 30

I am using cutadapt to extract multiple barcode sequence elements from reads into their own fastq files as required by downstream processing.

reads look like that: N(Pattern_A)(BARCODE_1)(Pattern_B)(Barcode_2)(Pattern_C)N

I can first filter all the reads matching the whole pattern (with --discard-untrimmed & -action=none) and then extract the two barcodes with two separate calls (without the two options).

This allows me to use the read position instead of read names to merge the extracted Barcodes downstream.

However, I also want to handle the case where (Pattern_A)(BARCODE_1)(Pattern_B) is present but (Barcode_2)(Pattern_C) is not. If I would just make the first part required in the first step, Barcode_2.fastq would potentially containain untrimmed reads (those where PAttern_C is not present). If I could tell cutadapt to --trim-untrimmed or so that would be much simpler than handling the different cases with different files. Is this possible with cutadapt?

cutadapt • 820 views
ADD COMMENT
0
Entering edit mode

Hey! I’m not sure but if you comment on the cutadapt GitHub they’re usually really helpful

ADD REPLY
0
Entering edit mode

Could you give an example for the input/output? cutadapt is tremendously flexible so I bet a combination of its features could do it, but I'm not sure I understand how the output should look. If it's supposed to be one file for each barcode combo, I think the demultiplexing feature might be able to handle it.

For example with this as example.fa:

>seq1
NNNNATGCGGGGCGATCCCCNNNN
>seq2
NNNNATGCTTTTCGATCCCCNNNN
>seq3
NNNNATGCGGGGCGATAAAANNNN
>seq4
NNNNATGCNNNNCGATAAAANNNN
>seq5
NNNNATGCGGGGCGATNNNNNNNN

And this as bc_both.fa (using cutadapt's linked adapter syntax):

>bc_GC
GGGG...CCCC
>bc_TC
TTTT...CCCC
>bc_GA
GGGG...AAAA
>bc_TA
TTTT...AAAA

You can do:

$ cutadapt --action none -g file:bc_both.fa -o out_{name}.fa example.fa

...and you get the different sequences written to different named files, with the two last sequences in out_unknown.fa as they only have one barcode or the other.

I tried to make it work with the combinatorial barcoding using single-ended behavior so that it would create the necessary barcode combos, but it doesn't seem to use the {name1} and {name2} placeholders if you're not using the paired-end behavior. Amy's right, though, they're very helpful on github if you need to ask there.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6