Question

Complex adaptor trimming

0

Entering edit mode

3.2 years ago

linc1464 • 0

Hello everyone,

Bioinformatics novice, here looking for help. I'm using Galaxy to try and remove adaptors from sequencing reads but it's tricky and I would like some advice on approach. Here's the experiment.

50bp PE reads. The 5' end of read 1 contains adaptor then 3x G. The 5' end of read 2 contains 15x T derived from polyadenylation during the library prep. I would like to trim the G's off the 5' end of read 1 and the T's from the 5' end of read 2. In addition, for any reads shorter than 50 bp, the 3' end of read 2 will contain 3x C (complement of the 3xG) and the 3' end of read 1 will have 15x A (the complement of the T's). Is there an additional trick to remove these instances too?

Thanks for any help!

sequencing Trimming galore trim • 1.5k views

ADD COMMENT • link updated 3.2 years ago by gb ★ 2.2k • written 3.2 years ago by linc1464 • 0

0

Entering edit mode

It always helps to post data instead of explaining the problem. Post some example reads and expected output. It can be done via CLI. Similar (for eg cutadapt, bbduk) tools are available in galaxy.

ADD REPLY • link 3.2 years ago by cpad0112 21k

0

Entering edit mode

So, reads will take the following format:

Read 1
5' ADAPTOR - GGG - then the mapped bit I want (size ~20-100 nts) - AAAAAAAAAAAAAAA-ADAPTOR 3'

Read 2
5' ADAPTOR - TTTTTTTTTTTTTTT - then the mapped bit I want (size ~20-100 nts) - CCC - ADAPTOR 3'

I have 50 bp paired-end reads and want to remove the ADAPTOR - GGG from the start and the AAAAAAAAAAAAAAA - ADAPTOR from the 3' end to leave the bit in the middle. Unfortunately, I'm only able to use Galaxy (have very limited programming knowledge).

ADD REPLY • link updated 3.2 years ago by GenoMax 151k • written 3.2 years ago by linc1464 • 0

0

Entering edit mode

Is this paired-end data? And do you have 2 FASTQ files (R1 and R2)? In that case you can upload both files to galaxy and use cutadapt or fastp on paired-end mode. I think people here need that info to be able to give a good answer.

ADD REPLY • link 3.2 years ago by gb ★ 2.2k

0

Entering edit mode

Thank you. Yes, I have two files per sample (read 1 and read 2).

ADD REPLY • link 3.2 years ago by linc1464 • 0

0

Entering edit mode

Is there real sequence in your read where you have added the word ADAPTOR above?

ADD REPLY • link 3.2 years ago by GenoMax 151k

0

Entering edit mode

You can use bbduk.sh from BBMap suite in two pass mode like this on the command line.

$ more test.fq
@M12345:751:000000000-F345F:1:1101:18044:1642 1:N:0:GATCTATC+ATGAGGCT
CGGTTCATCTCAGAGATCTCATGCTTGGTGTTGCGGAGGTCATCGCCATG
+
ABBAABFFFFFFGGCGGGGGGGHHHHHGFFHGHHGGGGGEGFFHHGGGGE
@M12345:751:000000000-F345F:1:1101:17624:1642 1:N:0:GATCTATC+ATGAGGCT
ACTGACTGACTGGGGCTCCAATTATGCCACCAGCCACCAGGCCACGCAGGCCTACGTTTATCCTAAAAAAAAAAAAA
+
AAAAAAAAAAAAAAAABAABFFFFFFFGGGGGGGGGGHHHGHGHGGGGGGGGHHHGHHHHGHHHHHHHHHHHHHHHH
@M12345:751:000000000-F345F:1:1101:16214:1642 1:N:0:GATCTATC+ATGAGGCT
CCAGCTTTATTGAAACCTATTACAGAAGACAATCCAAATAAAACCACTGT
+
AAAAAFFFFFFFGGGGGGGGGGHHHGHHHHHHHHHHHHHHGHHHGHGHHH
@M12345:751:000000000-F345F:1:1101:15835:1659 1:N:0:GATCTATC+ATGAGGCT
ACTGACTGACTGGGCCTTGGGTGGTTCAGTCAAAGAGGTAAGACCTCCAGCTGGCTCACAAGAGAAAAAAAAAAAA
+
BBBBBBBBBBBBBBBBBBAFA3ADBAGGGGGGGGGGHGGFG4EGHHHGHCHHCHGHHHHHHHGHHHHHHHHHHHHH

With the command

bbduk.sh -Xmx2g in=test.fq out=stdout.fq literal=ACTGACTGACTGGG ktrim=l k=10 | bbduk.sh -Xmx2g in=stdin.fq out=stdout.fq literal=AAAAAAA ktrim=r k=6 int=f

This will produce

@M12345:751:000000000-F345F:1:1101:18044:1642 1:N:0:GATCTATC+ATGAGGCT
CGGTTCATCTCAGAGATCTCATGCTTGGTGTTGCGGAGGTCATCGCCATG
+
ABBAABFFFFFFGGCGGGGGGGHHHHHGFFHGHHGGGGGEGFFHHGGGGE
@M12345:751:000000000-F345F:1:1101:17624:1642 1:N:0:GATCTATC+ATGAGGCT
GCTCCAATTATGCCACCAGCCACCAGGCCACGCAGGCCTACGTTTATCCT
+
AABAABFFFFFFFGGGGGGGGGGHHHGHGHGGGGGGGGHHHGHHHHGHHH
@M12345:751:000000000-F345F:1:1101:16214:1642 1:N:0:GATCTATC+ATGAGGCT
CCAGCTTTATTGAAACCTATTACAGAAGACAATCC
+
AAAAAFFFFFFFGGGGGGGGGGHHHGHHHHHHHHH
@M12345:751:000000000-F345F:1:1101:15835:1659 1:N:0:GATCTATC+ATGAGGCT
CCTTGGGTGGTTCAGTCAAAGAGGTAAGACCTCCAGCTGGCTCACAAGAG
+
BBBBAFA3ADBAGGGGGGGGGGHGGFG4EGHHHGHCHHCHGHHHHHHHGH

ADD REPLY • link 3.2 years ago by GenoMax 151k

score 0 · Answer 1 · 2022-04-13

0

Entering edit mode

3.2 years ago

gb ★ 2.2k

In short in galaxy you just:

upload both files
Open the cutadapt tool from the tool menu
select "paired-end" as first option
For "FASTQ/A file #1" you select your read1 file
For "FASTQ/A file #2" you select your read2 file
Fil in the read1 and read2 adapters
execute the tool

ADD COMMENT • link 3.2 years ago by gb ★ 2.2k