I have used cutadapt 1.3 and 1.11 on the same bacterial dataset. The results obtained differ markedly in terms of number of output reads (with the version 1.3 I obtain many more reads after the trimming). I kept the same parameters in both versions. Any idea of the reason(s) why ?
Perhaps you should post the two commands even if they are identical. Some skilled user might recognize something. Also, did you check out the default parameters in the two versions? Do they differ, maybe?
I don't know actually, but I found this on their web page. They also say that un-declared parameters are ignored, so I guess there aren't any defaults. However, you could perhaps try to run both versions with all these flags, and see if you get the same results or not. If not, we have an issue here.
"The read modifications described above are applied in the following order to each read. Steps not requested on the command-line are skipped."
Unconditional base removal with --cut
Quality trimming (-q)
Adapter trimming (-a, -b, -g and uppercase versions)
Read shortening (--length)
N-end trimming (--trim-n)
Length tag modification (--length-tag)
Read name suffix removal (--strip-suffix)
Addition of prefix and suffix to read name (-x/--prefix and -y/--suffix)
Double-encode the sequence (only colorspace)
Replace negative quality values with zero (zero capping, only colorspace)
Trim primer base (only colorspace)
I've tried as you suggested and I'm still obtaining different results: with version 1.3 I obtain an output with 12839813 reads. With version 1.11 I obtain 12661316 reads. Sincerely I'm having no clues about the possible reasons why ...
I have tried to make cutadapt produce identical results since version 1.0. The default settings for the alignment algorithm have been the same all the time. However, there may have been some changes in how command-line parameters are interpreted, but this should all be documented. Have a look through the changelog.
One possibility is that your adapter sequences may have special characters in them. Until version 1.6, any "U" characters in your adapter sequence would not be automatically converted to "T". And support for IUPAC wildcard characters was introduced only in version 1.7. If this is not the problem, then you may want to provide me with a small example file so that I can reproduce this myself. For that, please use the GitHub bug report you opened.
Perhaps you should post the two commands even if they are identical. Some skilled user might recognize something. Also, did you check out the default parameters in the two versions? Do they differ, maybe?
Thanks indeed for your answer. The command line I used in both cases is:
cutadapt -m45 -q10 -a file:[path_to_the_parameters_file] [path_to_the_input_file] > [path_to_the_output_file]
Where can I find information about the defaults for the old version ?
I don't know actually, but I found this on their web page. They also say that un-declared parameters are ignored, so I guess there aren't any defaults. However, you could perhaps try to run both versions with all these flags, and see if you get the same results or not. If not, we have an issue here.
"The read modifications described above are applied in the following order to each read. Steps not requested on the command-line are skipped."
Thank you for the hint: I'm making the test you suggested . Let's see what happens !!!
Don't forget to post it here for future people reading it!
I've tried as you suggested and I'm still obtaining different results: with version 1.3 I obtain an output with 12839813 reads. With version 1.11 I obtain 12661316 reads. Sincerely I'm having no clues about the possible reasons why ...
Write to the authors linking this post, they might be subscribed to Biostars and therefore jump in the conversation and give help!