Empty output file after trim primers with Cutadapt
0
0
Entering edit mode
3.6 years ago
ALEJANDRO • 0

Dear all, I have a problem with Cutadapt. After trimming primers from R1 and R2 fastq files, output fastq file is empty. Only the name of read is conserved. Lines corresponding with quality and DNA sequence are removed on the output fastq file. Primers are placed in 3' end for both R1 and R2 reads and I used the -a option followed by the sequence of primers. Output fastq file is generated by it is incomplete. I tried first merging both reads and then trimming 3' and 5' adapters using -a and -g options, but this problem remain also when using the merge fastq file. Anyone could help me to fix this problem?

Thank you!

cutadapt • 2.2k views
ADD COMMENT
1
Entering edit mode

please post the cutadapt command you have used and few example reads from R1 and R2

ADD REPLY
0
Entering edit mode

We used 3 forward and 2 reverse degenerate primers. Following, we detailed all possible primers.

Forward1A 
AGGTGAAGTAAAAGGTTCATACTTAAA
Forward1T
AGGTGAAGTAAAAGGTTCTTACTTAAA
Forward2AC
AGGTGAAGTTAAAGGTTCATACTTAAA
Forward2AT
AGGTGAAGTTAAAGGTTCATATTTAAA
Forward2TC
AGGTGAAGTTAAAGGTTCTTACTTAAA
Forward2TT
AGGTGAAGTTAAAGGTTCTTATTTAAA
Forward3A
AGGTGAAACTAAAGGTTCATACTTAAA
Forward3T
AGGTGAAACTAAAGGTTCTTACTTAAA
Reverse1AA
CCTTCTAATTTACCAACAACTG
Reverse1AT
CCTTCTAATTTACCAACTACTG
Reverse1TA
CCTTCTAATTTACCTACAACTG
Reverse1TT
CCTTCTAATTTACCTACTACTG
Reverse2A
CCTTCTAATTTACCAACAACAG
Reverse2T
CCTTCTAATTTACCTACAACAG

Example of two first R1 reads (fastq file):

@M02263:441:000000000-G496G:1:1101:15332:1822 1:N:0:1
AGGTGAAGTTAAAGGTTCTTACTTAAACATTACTGCTGCTACAATGGAAGAAGTATACAAACGTGCTGAGTATGCTAAAGCTGTTGGTTCTGTAATTGTTATGATTGACTTAGTAATGGGTTACACAGCAATTCAATGTTCTGCTATTTGGGCTCGTGAAAACGATATGCTTTTACACTTACACCGTGCTGGTAACTCTACTTACGCTCGTCAAAAAAATCACGGTATTAACTTCCGTGTTATTTGTAAAT
+
AAAAABD1B3B@FGGGGG1GFBGHFDAB1BADDFDEFG1BDA11DAB11B000D2D2B211BEBAAB1B0D2ADGGAA1AA11DBBFEEFB1D2BDFFFF2FFFG@F11BF1BG22B2BBFFF@1B1BG0BBFGB>22BF2FFFG1FFHFF?GGGEEEB/BF0>??/CG2GGGH2F2<FG1@1?GCGFCFGFHD1>FH1<FFHFDFCFECCG..=D.@?.;C/=:EECC;0;FGGFEFEG.0=;FFGFF0C
@M02263:441:000000000-G496G:1:1101:14789:1842 1:N:0:1
AGGTGAAGTTAAAGGTTCTTACTTAAACATTACTGCTGGTACAATGGAAGAAGTGTACAAACGCGCTGAGTATGCTAAAGCTGTTGGTTCTGTAATTGTTATGATCGATTTAGTATTGGGTTATACTGCTATTCAATGTGCTGCTATTTGGTCTCGTGATAATGATATGTTTTTACATTTACACCGTGCTGGTAACTCTACTTACGCACGTCAAAAAAATCATGGTATCAACTTCCGTGTTATTTGTAAAT
+
AAAAAFF1B3B3FGGGGGFGBAGHDDBB1BDADEFEGG1FFE11DBB11A000D1D2B211AA/AAE//0D2AAFFDA11@B1DFF>?FFG1F2@BF@GFBF@EG@F/?BF1BF22B2B/EF/B2>2BG11BBFF22>B222BBG1BFGFFFFGHCFEGFF>BF>F2FGFH2@F/@2>FG>@1?G/GAA/FBHG1>FGD1F1>1<>A.<<CH..<0.-:;C0;<0CGF0;0;CFGFBB.FA0;CF;FBB0;

Example of two first R2 reads (fastq file):

@M02263:441:000000000-G496G:1:1101:15332:1822 2:N:0:1
CCTTCTAATTTACCTACTACTGTACCTGCGTGGATGTGGTCTACACCTGACATACGCATCCATTTACAAATTACACGGAAGTTAATACCGTGGTTTTTTTGACGTGCGTATGTTGAGTTACCTGCACGGTGTAAGTGTAATAGCATATCGTTTTCACGAGCCCAAATAGCTGTTCTTTGAATTGCTGTGTAACCCATTACTAAGTCAATCATAACAATTACTGAACCAACTGCTTTAGCATACTCAGCACG
+
AAAAAFFFFFFFGGFAF1AB1FFDDG3B00EA00B1BD1FG21A1BE000B1D1A0//BEEAADF2B211BAAA1B/A//?BD2@D2FF/FG/FFEFGGG?/BE/?>/F//BFFF2CGG2FF1<B0F?CFEEF22FHFG22@2<>@1@GCGHFG1??F//->F..0>11>1<00=DGHG0<DD<DG0DCGC0CG.;;G0<:<0/=G0;CHBG00/;0CFB00;00;C.999;0CFG00;0;B9CB0090;-
@M02263:441:000000000-G496G:1:1101:14789:1842 2:N:0:1
CCTTCTAATTTACCTACAACTGTACCTGCGTGAATATGATCTACACCTGACATACGCATCCATTTACAACTTACACGGAAGTTGATACCATGATTTTTTTGACGTGCGTATGTTGAGTTACCTGCACGGTGTAAATGTAATAACATATCATTTTCACGTGTCCAAATAGCTGCTCTTTGAATTGCTGTATAACCCATTACTAAATCGATCATAACAATTACTGACCCAACTGCTTTAGCATACTCAGCGCG
+
AAAAAFFFFFFFGG1BG11B1BFBAG3B00EE0FF2GGBFHAAB1BF0B0A1DAABA/EEFBBFG2F211B11B1BAEA/EFF@BD2GG1FF1FGFGHGG@/BE>EE/F??FF2B2GFG2FGDFF0B?CFHEHB2BGGG2BG2@FF1@G2GH2F2@DF0<0?FF1?F11?<<11?FHHG1<GF>GGDHDG>1GG<CCG0==<0<DGGCCH.G0::00<C000;0/;:..9/90CFG0;;0C;0CB00C.;-

Example of two first R1 reads after applying Cutadapt

@M02263:441:000000000-G496G:1:1101:15332:1822 1:N:0:1

+

@M02263:441:000000000-G496G:1:1101:14789:1842 1:N:0:1

+

Example of two first R2 reads after applying Cutadapt

@M02263:441:000000000-G496G:1:1101:15332:1822 2:N:0:1

+

@M02263:441:000000000-G496G:1:1101:14789:1842 2:N:0:1

+

Command used for trimming forward primers:

cutadapt -a AGGTGAAGTAAAAGGTTCATACTTAAA -a AGGTGAAGTAAAAGGTTCTTACTTAAA -a AGGTGAAGTTAAAGGTTCATACTTAAA -a AGGTGAAGTTAAAGGTTCATATTTAAA -a AGGTGAAGTTAAAGGTTCTTACTTAAA -a AGGTGAAGTTAAAGGTTCTTATTTAAA -a AGGTGAAACTAAAGGTTCATACTTAAA -a AGGTGAAACTAAAGGTTCTTACTTAAA -o /home/alejandro/Documents/S1R1.fastq '/home/alejandro/Desktop/Raw data descomprimido/S1-R1.fastq'

Command used for trimming reverse primers:

cutadapt -a CCTTCTAATTTACCAACAACTG -a CCTTCTAATTTACCAACTACTG -a CCTTCTAATTTACCTACAACTG -a CCTTCTAATTTACCTACTACTG -a CCTTCTAATTTACCAACAACAG -a CCTTCTAATTTACCTACAACAG -o /home/alejandro/Documents/S1R2.fastq '/home/alejandro/Desktop/Raw data descomprimido/S1-R2.fastq'

Thank you for you support.

Best regards

ADD REPLY
1
Entering edit mode

You are getting the empty reads because you are removing 5' sequences (for each read) as 3' sequences due to which entire read is getting trimmed. Following are the sequences that occur at 5' end for each read, however, you are removing it as 3' sequence (-u option in cutadapt).

For R1, this primer sequence AGGTGAAGTTAAAGGTTCTTACTTAAA is a problem with current cutadapt command. Rest of the R1 reads in your file are affected by other primer sequences:

$ cutadapt -a AGGTGAAGTTAAAGGTTCTTACTTAAA --quiet -e 0 R1.fastq

@M02263:441:000000000-G496G:1:1101:15332:1822 1:N:0:1

+

@M02263:441:000000000-G496G:1:1101:14789:1842 1:N:0:1

+

For R2, this primer sequence CCTTCTAATTTACCTACAACTG is a problem with current cutadapt command. Rest of the R2 reads in your file are affected by other primer sequences:

$ cutadapt -a CCTTCTAATTTACCTACAACTG   --quiet  R2.fastq

@M02263:441:000000000-G496G:1:1101:15332:1822 2:N:0:1

+

@M02263:441:000000000-G496G:1:1101:14789:1842 2:N:0:1

+

primer_search.png

ADD REPLY
1
Entering edit mode

Your primer sequences seem to vary only at few positions and I constructed a common sequence (one for forward and one for reverse) as per IUPAC code and used them to trim the reads. Please do a thorough check on the output.

For R1

$ cutadapt -g AGGTGAAGTAAAAGGTTCWTAYTTAAA  --quiet  R1.fastq

@M02263:441:000000000-G496G:1:1101:15332:1822 1:N:0:1
CATTACTGCTGCTACAATGGAAGAAGTATACAAACGTGCTGAGTATGCTAAAGCTGTTGGTTCTGTAATTGTTATGATTGACTTAGTAATGGGTTACACAGCAATTCAATGTTCTGCTATTTGGGCTCGTGAAAACGATATGCTTTTACACTTACACCGTGCTGGTAACTCTACTTACGCTCGTCAAAAAAATCACGGTATTAACTTCCGTGTTATTTGTAAAT
+
B1BADDFDEFG1BDA11DAB11B000D2D2B211BEBAAB1B0D2ADGGAA1AA11DBBFEEFB1D2BDFFFF2FFFG@F11BF1BG22B2BBFFF@1B1BG0BBFGB>22BF2FFFG1FFHFF?GGGEEEB/BF0>??/CG2GGGH2F2<FG1@1?GCGFCFGFHD1>FH1<FFHFDFCFECCG..=D.@?.;C/=:EECC;0;FGGFEFEG.0=;FFGFF0C
@M02263:441:000000000-G496G:1:1101:14789:1842 1:N:0:1
CATTACTGCTGGTACAATGGAAGAAGTGTACAAACGCGCTGAGTATGCTAAAGCTGTTGGTTCTGTAATTGTTATGATCGATTTAGTATTGGGTTATACTGCTATTCAATGTGCTGCTATTTGGTCTCGTGATAATGATATGTTTTTACATTTACACCGTGCTGGTAACTCTACTTACGCACGTCAAAAAAATCATGGTATCAACTTCCGTGTTATTTGTAAAT
+
B1BDADEFEGG1FFE11DBB11A000D1D2B211AA/AAE//0D2AAFFDA11@B1DFF>?FFG1F2@BF@GFBF@EG@F/?BF1BF22B2B/EF/B2>2BG11BBFF22>B222BBG1BFGFFFFGHCFEGFF>BF>F2FGFH2@F/@2>FG>@1?G/GAA/FBHG1>FGD1F1>1<>A.<<CH..<0.-:;C0;<0CGF0;0;CFGFBB.FA0;CF;FBB0;

For R2

$ cutadapt -g CCTTCTAATTTACCWACWACWG --quiet   R2.fastq     

@M02263:441:000000000-G496G:1:1101:15332:1822 2:N:0:1
TACCTGCGTGGATGTGGTCTACACCTGACATACGCATCCATTTACAAATTACACGGAAGTTAATACCGTGGTTTTTTTGACGTGCGTATGTTGAGTTACCTGCACGGTGTAAGTGTAATAGCATATCGTTTTCACGAGCCCAAATAGCTGTTCTTTGAATTGCTGTGTAACCCATTACTAAGTCAATCATAACAATTACTGAACCAACTGCTTTAGCATACTCAGCACG
+
FDDG3B00EA00B1BD1FG21A1BE000B1D1A0//BEEAADF2B211BAAA1B/A//?BD2@D2FF/FG/FFEFGGG?/BE/?>/F//BFFF2CGG2FF1<B0F?CFEEF22FHFG22@2<>@1@GCGHFG1??F//->F..0>11>1<00=DGHG0<DD<DG0DCGC0CG.;;G0<:<0/=G0;CHBG00/;0CFB00;00;C.999;0CFG00;0;B9CB0090;-
@M02263:441:000000000-G496G:1:1101:14789:1842 2:N:0:1
TACCTGCGTGAATATGATCTACACCTGACATACGCATCCATTTACAACTTACACGGAAGTTGATACCATGATTTTTTTGACGTGCGTATGTTGAGTTACCTGCACGGTGTAAATGTAATAACATATCATTTTCACGTGTCCAAATAGCTGCTCTTTGAATTGCTGTATAACCCATTACTAAATCGATCATAACAATTACTGACCCAACTGCTTTAGCATACTCAGCGCG
+
FBAG3B00EE0FF2GGBFHAAB1BF0B0A1DAABA/EEFBBFG2F211B11B1BAEA/EFF@BD2GG1FF1FGFGHGG@/BE>EE/F??FF2B2GFG2FGDFF0B?CFHEHB2BGGG2BG2@FF1@G2GH2F2@DF0<0?FF1?F11?<<11?FHHG1<GF>GGDHDG>1GG<CCG0==<0<DGGCCH.G0::00<C000;0/;:..9/90CFG0;;0C;0CB00C.;-

Since they are paired end, I would suggest this:

$ cutadapt -g AGGTGAAGTAAAAGGTTCWTAYTTAAA -G CCTTCTAATTTACCWACWACWG -o R1_trimmed.fastq -p R2_trimmed.fastq R1.fastq R2.fastq
ADD REPLY
0
Entering edit mode

Thank you very much for taking time to check commands and reads. I appreciate your selfless work. Consensus sequence of reverse primer is fine, however, after review consensus sequence of forward primer I believe that AGGTGAARYWAAAGGTTCWTAYTTAAA is more addequate. Following I detailed primer sequences:

Forward1 
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTGAAGTAAAAGGTTCWTACTTAAA
Forward 2
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTGAAGTTAAAGGTTCWTAYTTAAA
Forward3
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGGTGAAACTAAAGGTTCWTACTTAAA

Reverse1
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTCTAATTTACCWACWACTG
Reverse2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTCTAATTTACCWACAACAG

Thank you very much for you help!! Best regards

ADD REPLY
1
Entering edit mode

Hopefully, issue is resolved and consensus for R1you posted is correct. If query is addressed, please resolve the post, by accepting the answer.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6