Hi, I have a question about FLASH parameter.
FLASH has a maximum overlap parameter called -M.
And parameter recommend this value that calculated from the read length, fragment size, fragment length standard deviation.
-r, --read-len=LEN
-f, --fragment-len=LEN
-s, --fragment-len-stddev=LEN
Average read length, fragment length, and fragment
standard deviation. These are convenience parameters
only, as they are only used for calculating the
maximum overlap (--max-overlap) parameter.
***The maximum overlap is calculated as the overlap of
average-length reads from an average-size fragment
plus 2.5 times the fragment length standard
deviation.***
The default values are -r 100, -f 180,
and -s 18, so this works out to a maximum overlap of
65 bp. If --max-overlap is specified, then the
specified value overrides the calculated value.
So, I made a python code for calculate read length, average read length, standard deviation read length.
I suddenly found that everything is wrong. Because I can't get fragment size. I just have a forward, reverse fastq file.
------------------------------------- <fragment>
-----------------------> <forward read>
over_lap
<----------------------- <reverse read>
I confused all about that.
I set the -M parameter(max overlap) Avg(forward read length) + 2.5 * (standard deviation(forward read length).
But I think it is wrong. Because FLASH recommended the value like this.
The maximum overlap is calculated as the overlap of
average-length reads from an average-size fragment
plus 2.5 times the fragment length standard
deviation.
I think I should know about fragment size. But I just have forward / reverse fastq file.
what do you think I should do? I really need your advice.
If you have a reference genome, you can align the reads to the genome to get a bam file, and run
CollectInsertSizeMetrics
from Picard