Trim FWD and the reverse-complement of REV off of R1 (forward reads)

Question

cutadapt summary statistics after every run

0

Entering edit mode

5.3 years ago

bnayer26 • 0

Hi, I am new to analyzing RNA seq data and have just started running cutadapt to trim my adapter sequences from my paired end data. If I run one line of code at a time, I can see a very nice summary statistics as is given in this page of the cutadapt documentation (https://cutadapt.readthedocs.io/en/stable/guide.html#cutadapt-s-output). It looks something like this:

    This is cutadapt 2.7 with Python 3.5.2
Command line parameters: --cores=14 -q 10,10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --info-file=summary1.txt -o EBSP3_L3_1_trimmed.fq.gz -p EBSP3_L3_2_trimmed.fq.gz /home/bnay2/JeanProject/JeanRawData/EBSP3_L3_1.fq.gz /home/bnay2/JeanProject/JeanRawData/EBSP3_L3_2.fq.gz
Processing reads on 14 cores in paired-end mode ...
[    8=------] 00:34:05    31,431,443 reads  @     65.1 µs/read;   0.92 M reads/minute
Finished in 2045.05 s (65 us/read; 0.92 M reads/minute).

=== Summary ===

Total read pairs processed:         31,431,443
  Read 1 with adapter:                 866,710 (2.8%)
  Read 2 with adapter:                 839,966 (2.7%)
Pairs written (passing filters):    31,431,443 (100.0%)

Total basepairs processed: 9,429,432,900 bp
  Read 1: 4,714,716,450 bp
  Read 2: 4,714,716,450 bp
Quality-trimmed:               3,496,279 bp (0.0%)
  Read 1:     1,212,666 bp
  Read 2:     2,283,613 bp
Total written (filtered):  9,420,283,738 bp (99.9%)
  Read 1: 4,710,662,353 bp
  Read 2: 4,709,621,385 bp

=== First read: Adapter 1 ===

Sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC; Type: regular 3'; Length: 34; Trimmed: 866710 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3

Bases preceding removed adapters:
  A: 29.7%
  C: 30.1%
  G: 27.9%
  T: 12.3%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
3   697484  491116.3    0   697484
4   129061  122779.1    0   129061
5   33700   30694.8 0   33700
6   2812    7673.7  0   2812
7   982 1918.4  0   982
8   144 479.6   0   144

But I wanted to run the code in a loop as part of a script, and I wanted to save these summary statistics information for each trimming into a text file where I can view it later.

I see that the documentation mentions this:

Format of the info file When the --info-file command-line parameter is given, detailed information about the found adapters is written to the given file. The output is a tab-separated text file. Each line corresponds to one read of the input file (unless –times is used, see below). A row is written for all reads, even those that are discarded from the final output FASTA/FASTQ due to filtering options (such as --minimum-length).

But when I tried --info-file=summary.txt, it generated a 12 GB file that has multiple lines of the following, which is very different from the kind of output I used to get shown above:

E00552:427:H7CVVCCX2:3:1101:5071:1379 1:N:0:NAGAGAGG+NACTCCTT   -1      ATCCTGTGAGTGTGTGTATACAGATATTATAGAAATGCTTTTAGGCATCTTTGAAACCAAGCCTATGTGTGAATAGTTTGTGAAAGAGATGGCAAACTCGGATGGGGGAATACCCAAGGGTCATGGGTTTTATGTGTCGCTTTGGTGTA   AAFFJJJJJJJJFJFJJJJFJJJJJJJJJJJJJJJJJFJJJJJJFJJJJJJJJJJJJJJJJJJJJFJ<JJJJJJJ<A-<<AJJJFFAJFJJJJJFFA-A-JF7A<<J-7<AFJ7-A<F<J7AJF7AJ7---<FA-A--<)7--<<-<AA
E00552:427:H7CVVCCX2:3:1101:5294:1379 1:N:0:NAGAGAGG+NACTCCTT   -1      AAGGGGATAGGTGAGTGGCACGCCAAAGCCATGTGGGGGTCAGGGTACCAGAACTAGGCCCTGGTGCAGGAATACATCAGGCAGGAAGGGGGTGACAAAGGGGTCTGTGGGCTGCATCCATCTGGTTCTTGACTGTCGCTTATACACAT   AAFFJJFJJJJAJJJFJJJ<AJJJJJJJJJJFJFAJJ<F<JJJJJFJJJFFJJJJFAJJFJJJJFJF-AJJJJJJJJ-JF<A<AFF<FAJ<-7<-A<-F<AA-7AAJ-7FAFJJJFAJ-7-<-F---7<FFJJ<AAA)A<FFAJ<AFJF
E00552:427:H7CVVCCX2:3:1101:5538:1379 1:N:0:NAGAGAGG+NACTCCTT   -1      TCTTTCTCTAAAGAGCTGCTTCTAATCTCCTGCTGGAGGCCTCCTTGAGCTCTCAGACATGGCTTCTGCCAACCCAACCTGCTGCTGAGTAAGCCAAAGACTCTCCTTCCTGTAGAAACCAGCAAGGAGCGCCCTGCCCCTGTCCTTTT   AAAFJJFJJJJJJJJJJJJ<FJJJJJJAJJJJJJFJJ-AJJJJJJJJJJJJJJJJJFJJJJFJJJJJJJJJJJJJFA-FJ-FJAFJJF7FFFJJFFAA-7F-A7AJF7F<7A<FFFFJJFJFJ<A7)-7F<<)7-7A)-77A7AFA7<-

Can someone please help me understand how to get the summary statistics as I see them in my first output, into a summary text file so I can view it later for all my samples?

Thank you so much.

RNA-Seq adapter trimming cutadapt • 6.9k views

ADD COMMENT • link updated 2.9 years ago by morteza.mahmoudisaber ▴ 80 • written 5.3 years ago by bnayer26 • 0

score 1 · Answer 1 · 2020-01-04

1

Entering edit mode

5.3 years ago

ATpoint 87k

This is printed to stderr so you can capture it basically like:

for i in files*
  do
  cutadapt (...) 2> summary_for_${i}
  done

The 2> does the trick. Once you have this you could summarize all reports into a nice html report using multiqc.

ADD COMMENT • link 5.3 years ago by ATpoint 87k

0

Entering edit mode

Hi thanks for your answer! I will give it a try...so you mean I should add this 2> at the very end of my cutadapt code right? And do I need to give any file extension? assuming my summary file name is summary1. So I just write 2> summary1 is that it? or should I give some extension to the filename?

Thanks again!

ADD REPLY • link 5.3 years ago by bnayer26 • 0

0

Entering edit mode

Yes, add it as you say. File names are optional, it will simply be a plain text document, I typically use txt but be aware that the name is unique in every iteration otherwise it will be overwritten. To append to the same file use 2>>.

ADD REPLY • link 5.3 years ago by ATpoint 87k

2

Entering edit mode

I have tried this method and it writes a blank file. -Found answer. I had specified the -o argument, so the summary was not going to standard error, it was going to standard output. Using 1> filename instead of 2>filename therefore worked.

ADD REPLY • link 5.2 years ago by samhairle ▴ 20

0

Entering edit mode

Please show code, anecdotal descriptions are difficult to debug.

ADD REPLY • link 5.2 years ago by ATpoint 87k

0

Entering edit mode

Okay thank you so much I will give it a try!

ADD REPLY • link 5.3 years ago by bnayer26 • 0

0

Entering edit mode

Hi I'm having more or less the same issue. I'm trying to get all the details from the summary once cutadapt had removed all the pair-end primers. My script looks like this:

cutadapt<-"/share/apps/anaconda/envs/R360/bin/cutadapt"

system2(cutadapt, args = "--version")

path.cut.Man <- file.path(pathMan, "cutadapt") if(!dir.exists(path.cut.Man)) dir.create(path.cut.Man) fnFs.cut.Man <- file.path(path.cut.Man, basename(fnFsman)) fnRs.cut.Man <- file.path(path.cut.Man, basename(fnRsman))

FWD.RC <- dada2:::rc(FWD1) REV.RC <- dada2:::rc(REV)

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

R1.flags <- paste("-g", FWD1, "-a", REV.RC)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

R2.flags <- paste("-G", REV, "-A", FWD.RC)

Run Cutadapt

for(i in seq_along(fnFsman)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2,
"-o 1> report.txt", fnFs.cut.Man[i], "-p 1> report.txt", fnRs.cut.Man[i],
fnFs.filtN.Man[i], fnRs.filtN.Man[i])) }

[ 8<------] 00:00:13 174,110 reads @ 75.6 µs/read; 0.79 M reads/minute [ 8=---------] 00:00:04 63,102 reads @ 76.2 µs/read; 0.79 M reads/minute [ 8<-----] 00:00:15 205,940 reads @ 75.8 µs/read; 0.79 M reads/minute [ 8<--------] 00:00:06 85,910 reads @ 74.6 µs/read; 0.80 M reads/minute [ 8=---------] 00:00:05 66,637 reads @ 76.4 µs/read; 0.79 M reads/minute [ 8<--------] 00:00:06 83,522 reads @ 76.0 µs/read; 0.79 M reads/minute [ 8=--------] 00:00:07 100,576 reads @ 73.1 µs/read; 0.82 M reads/minute [ 8<--------] 00:00:06 84,447 reads @ 73.5 µs/read; 0.82 M reads/minute [ 8<--------] 00:00:06 81,623 reads @ 73.8 µs/read; 0.81 M reads/minute [ 8<---------] 00:00:04 58,366 reads @ 77.6 µs/read; 0.77 M reads/minute [ 8=---------] 00:00:05 79,614 reads @ 73.5 µs/read; 0.82 M reads/minute [ 8<-------] 00:00:10 139,749 reads @ 76.5 µs/read; 0.78 M reads/minute [8=----------] 00:00:01 25,220 reads @ 77.9 µs/read; 0.77 M reads/minute [ 8<--------] 00:00:06 87,436 reads @ 73.7 µs/read; 0.81 M reads/minute [ 8=---------] 00:00:05 78,237 reads @ 74.7 µs/read; 0.80 M reads/minute

It works but the arguments for the txt output files or "report.txt" only take in to account the last summary data ( the one for :[ 8=---------] 00:00:05 78,237 reads @ 74.7 µs/read; 0.80 M reads/minute) and I would like to have them all.

I would be very grateful if someone could make some observations/ corrections for this issue.

ADD REPLY • link 4.9 years ago by brenda.cla • 0

score 0 · Answer 2 · 2022-06-08

0

Entering edit mode

2.9 years ago

morteza.mahmoudisaber ▴ 80

summary statistics can be saved by

cutadapt command > summary_statistics.txt

stderr can be saved by

cutadapt command 2> stderr.txt

ADD COMMENT • link 2.9 years ago by morteza.mahmoudisaber ▴ 80