How can I trim the primer and mask it with the original primer sequences?
2
0
Entering edit mode
2.9 years ago
Libin • 0

We are using multiple primers for the PCR, and we want to remove any alternations introduced from the primer sequences while keeping the sequence as intact as possible since it will affected the structure of the downstream protein product. I only find tools that can mask the primer sequences with Ns, but I want to keep the sequence intact. Is there a way that I can trim off the matched primer sequences and replace it with the closest matched reference primer sequence?

The Input Fastq could be something look like this:

    @M03739:62:000000000-JDHFC:1:1101:17064:1807 1:N:0:19 
CATTCG**CAGATGCAGCTGGTGCA**GTCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATACTATGAATTGGATACGACAGGCCCCTGGACAAGGGCTTGAGTGGCTGGGATGGATCAACACCAACAGTGGGAACCCAACGTATACCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCAGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGGG 
    + 
    ...

    @M03739:62:000000000-JDHFC:1:1101:23479:1823 1:N:0:19 
TATTAG**GAGGTGCAGCTGGTGCA**GTGAGCTGCCTTGATGGAGCTAGTACACTTGCTCAACATGGCTGAGTGTTCCCTGTGTTGCACCAGGCACAACACATCCCCCAAGAGCTTCTCATGCTTGCACATGCACTCAGAGTCCACCTTCACACAGCCACAACGACGGCCCAGAGCCGGATCTCTCATCTCCAAGATAAACATAGTGCCCTGGGGAGGGACCACGGTCACCGTCCCCTCACATTTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAATTAACATCTCGTATGCCGTCT 
    + 
    ...

The primer fasta could be like this:

>Primer-1 
CAGATGCAGCTGGTGCA

>Primer-2 
CAGGTGCAGCTGGTGCA

The primer sequence starts at the 7th base pair, so after the primer trimming and masking, I hope I could get the output fastq like this:

    @M03739:62:000000000-JDHFC:1:1101:17064:1807 1:N:0:19 
**CAGATGCAGCTGGTGCA**GTCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATACTATGAATTGGATACGACAGGCCCCTGGACAAGGGCTTGAGTGGCTGGGATGGATCAACACCAACAGTGGGAACCCAACGTATACCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCAGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGGG 
    + 
    ... 

    @M03739:62:000000000-JDHFC:1:1101:23479:1823 1:N:0:19 
**CAGGTGCAGCTGGTGCA**GTGAGCTGCCTTGATGGAGCTAGTACACTTGCTCAACATGGCTGAGTGTTCCCTGTGTTGCACCAGGCACAACACATCCCCCAAGAGCTTCTCATGCTTGCACATGCACTCAGAGTCCACCTTCACACAGCCACAACGACGGCCCAGAGCCGGATCTCTCATCTCCAAGATAAACATAGTGCCCTGGGGAGGGACCACGGTCACCGTCCCCTCACATTTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAATTAACATCTCGTATGCCGTCT
    + 
    ...

The sequences preceding the primer shall be trimmed, and the altered the primer was replaced by the reference primer sequence.

Mask Trimming Primer • 1.6k views
ADD COMMENT
0
Entering edit mode

Post example input data and expected output data instead of describing the data and don't post the data images.

ADD REPLY
0
Entering edit mode

Input data are primer fasta and target sequence fastq files, output files are modified target sequence fastq files

ADD REPLY
0
Entering edit mode

Example data is added to OP.

ADD REPLY
0
Entering edit mode

Since you were providing incomplete fastq records, I created a fasta file with necessary sequences from example data and am posting example code (and code works with fastq files):

input:

$  cat test.fa
>M03739:62:000000000-JDHFC:1:1101:17064:1807 1:N:0:19
CATTCGCAGATGCAGCTGGTGCAGTCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTT
>M03739:62:000000000-JDHFC:1:1101:23479:1823 1:N:0:19
TATTAGGAGGTGCAGCTGGTGCAGTGAGCTGCCTTGATGGAGCTAGTACACTTGCTCAACATGGCT

output:

$ cutadapt --quiet --action retain -e 0.2 -g CAGATGCAGCTGGTGCA test.fa  | sed -r '/>/! s/^.{17}/CAGATGCAGCTGGTGCA/'

>M03739:62:000000000-JDHFC:1:1101:17064:1807 1:N:0:19
CAGATGCAGCTGGTGCAGTCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTT
>M03739:62:000000000-JDHFC:1:1101:23479:1823 1:N:0:19
CAGATGCAGCTGGTGCAGTGAGCTGCCTTGATGGAGCTAGTACACTTGCTCAACATGGCT

Either you identify the fixed part of primer sequence to remove or allow cutadapt or any other trimming tool certain error tolerance (0.2 for example data) to trim fastq records.

If it is always after 7bp, remove first 7 bp first and replace first 17 characters (primer sequence length) with primer sequence, after removing 7 bp.

ADD REPLY
0
Entering edit mode
2.9 years ago

There are lots of tools to just trim the first 7 bases. You can even have bcl2fastq do that. But I think the rest will have to be custom. I'm not sure why you'd want to do this; for demultiplexing based on the primers?

ADD COMMENT
0
Entering edit mode
2.9 years ago
cfos4698 ★ 1.1k

You can use cutadapt. For example:

cutadapt -j0 [OPTIONS] **--action lowercase** -o cutadapt_R1.fq.gz -p cutadapt_R2.fq.gz raw_R1.fastq.gz raw_R2.fastq.gz

You'll need to decide on the correct options to detect your primer sequences. The documentation is very good, and the author replies on github if you need help. The key flag is --action:

  --action {trim,retain,mask,lowercase,none}
                        What to do if a match was found. trim: trim adapter and up- or downstream
                        sequence; retain: trim, but retain adapter; mask: replace with 'N' characters;
                        lowercase: convert to lowercase; none: leave unchanged. Default: trim
ADD COMMENT

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6