I have a 2 questions about cd-hit-otu's parameter.
The cd-hit-otu have six steps.
1. Raw read filtering and trimming
2. Clustering of duplicates
3. Chimeric reads detection
4. Read recruitment allowing one error
5. Removing noise
6. OTU clustering
and this is the parameter for first step.
-P 0
-i fastq-file-of-raw-reads
-o output-fasta-file
-t trim_cutoff, default 1.0 (means no trimming)
if cutoff is a integer number > 1 (like 200), the program will trim sequences to this length
if cutoff is a fraction (like 0.8), the program will keep fraction of this reads
-p prefix-length/primers_file
(a) if a primers_file is provided,
read primers from this file, remove the reads don't match the primers
(b) if a prefix-length (a digit number) is provided, default 6
get the consensus of prefix of the all reads
remove the reads without this prefix
specifically, I can not understand about -t and -p parameters.
Q1. what is the meaning of if cutoff is a fraction (like 0.8), the program will keep fraction of this reads?
I just understand this if parameter is less than 1(ex. 0.8), the program will trim the sequence like sequence * 0.8.
It's mean if sequence length less than sequence * 0.8, it will be removed. Is it right?
Q2. what is the meaning of prefix-length?(-p's (b))