How can I get the max overlap parameter / Is is possible to get fragment size from forward / reverse pair's fastq file?
1
0
Entering edit mode
7.3 years ago
qhsh9713 • 0

Hi, I have a question about FLASH parameter.

FLASH has a maximum overlap parameter called -M.

And parameter recommend this value that calculated from the read length, fragment size, fragment length standard deviation.

   -r, --read-len=LEN
   -f, --fragment-len=LEN
   -s, --fragment-len-stddev=LEN
                      Average read length, fragment length, and fragment
                      standard deviation.  These are convenience parameters
                      only, as they are only used for calculating the
                      maximum overlap (--max-overlap) parameter.

                      ***The maximum overlap is calculated as the overlap of
                      average-length reads from an average-size fragment
                      plus 2.5 times the fragment length standard
                      deviation.*** 

                      The default values are -r 100, -f 180,
                      and -s 18, so this works out to a maximum overlap of
                      65 bp.  If --max-overlap is specified, then the
                      specified value overrides the calculated value.

So, I made a python code for calculate read length, average read length, standard deviation read length.

I suddenly found that everything is wrong. Because I can't get fragment size. I just have a forward, reverse fastq file.

          -------------------------------------  <fragment>

          ----------------------->                 <forward read>
                          over_lap
                         <-----------------------  <reverse read>

I confused all about that.

I set the -M parameter(max overlap) Avg(forward read length) + 2.5 * (standard deviation(forward read length).

But I think it is wrong. Because FLASH recommended the value like this.

                     The maximum overlap is calculated as the overlap of
                      average-length reads from an average-size fragment
                      plus 2.5 times the fragment length standard
                      deviation.

I think I should know about fragment size. But I just have forward / reverse fastq file.

what do you think I should do? I really need your advice.

FLASH software parameter Assembly • 2.6k views
ADD COMMENT
0
Entering edit mode

If you have a reference genome, you can align the reads to the genome to get a bam file, and run CollectInsertSizeMetrics from Picard

ADD REPLY
1
Entering edit mode
7.3 years ago
h.mon 35k

If your samples were run through BioAnalyser prior to library preparation (could be some other magical machine, I am terrible at wet-lab stuff), it would spit out a fragment length distribution.

Bioinformatically, you could assemble and map the reads against the assembly to estimate insert (fragment) size, see tadpole and bbmap documentations, they are fast programs to perform these tasks. Hint, use parameter ihist with bbmap to get insert size distribution.

Finally, I would use bbmerge instead of flash to perform pair merging. It also outputs an estimate of fragment size and its standard deviation (again, use parameter ihist) - and do not need this information a priori.

ADD COMMENT

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6