Get Average Insert Size Of Fastq?
2
1
Entering edit mode
12.4 years ago
dan79 ▴ 90

Is there a way to do it? Sorry for the uninformative question, so I have downloaded an SRA file from NCBI and used included sratoolkit to split the file into two fastq sequences. I am trying to do a de novo assembly using these paired-end strand_specific reads. However, a required parameter is the average insert size. Does anyone know how to obtain this from an SRA file or fastq?

fastq • 15k views
ADD COMMENT
0
Entering edit mode

Please describe your question so people can help you. I think I understand what your asking, but without more information it is difficult to answer.

ADD REPLY
0
Entering edit mode

Edited, thanks.

ADD REPLY
0
Entering edit mode

You will need to align the reads (both pairs). Then you can find the insert lengths by parsing the SAM/BAM file.

ADD REPLY
0
Entering edit mode

Align the reads to a reference genome? This seems counterintuative considering the whole point of a de novo assembly is to not need a reference.

ADD REPLY
0
Entering edit mode

Good point. Sorry. I need to read more carefully. I don't know the answer. I look forward to seeing the best solution.

ADD REPLY
4
Entering edit mode
12.4 years ago
matted 7.8k

Guessing an insert size length, assembling, mapping to the assembly, and then iterating with the improved insert size length (from the mappings) is a reasonable choice, and probably about the best you can do. You hopefully should have some rough idea from the library preparation method (size selection criteria or if it's jumping library or not).

In fact, Velvet does this automatically (from the 1.1 manual): "If the insert length of a library is unspeciļ¬ed, Velvet will attempt to measure it for you, based on the read-pairs which happen to map onto a common node." As they point out, it's critical to check the reported estimate to make sure it's sane.

ADD COMMENT
0
Entering edit mode

SOAPdenovo gives you an initial insert size estimate as well.

ADD REPLY
2
Entering edit mode
12.4 years ago
dfornika ★ 1.1k

I'm going to suggest a lazy, imperfect solution. If this is illumina (Genome Analyzer, HiSeq etc.) then th insert size is normally about 300bp. If your assembler isn't too sensitive to that parameter, try 300bp as a reasonable guess.

ADD COMMENT
0
Entering edit mode

Haha, well its better than nothing. I read that somewhere too, yes the sequencer was an Illumina. I already started the job with 300 insert size. +1

ADD REPLY

Login before adding your answer.

Traffic: 2261 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6