Scaffolding with SSPACE returned scaffold the same as initial contigs set
1
0
Entering edit mode
9.2 years ago
pbigbig ▴ 250

Hi all,

I have used Minia to assemble a contigs set from my paired end reads, as Minia instructed that it doesn't use pairing information for constructing assembly, then I continue to try SSPACE to exploit this pairing infomation from the SAME LIBRARY (which I have used to construct contigs set) for scaffolding. But after tried different parameters in SSPACE (k, a or parameters of lib.txt), it ALWAYS returns the scaffolded set exactly the SAME as initial contigs set. Did I miss something? Even if I didn't put the best parameters, I would be obtained a scaffold which might be somehow different from the initial contigs set, but here they are exactly the same.
Any suggestion is greatly welcomed, thanks a lot!


Here are my input and scaffolded summary:

my lib.txt: k71 bowtie 1.fastq 2.fastq 440 0.75 FR


Required inputs:
      -l = lib.txt
            Number of paired files = 1
      -s = k71contigs.fasta
      -b = k71origin

Optional inputs:
      -x = 0
      -z = 0
      -k = 10
      -g = 0
      -a = 0.7
      -n = 10
      -T = 16
      -p = 1



READING READS k71:
------------------------------------------------------------
      Total inserted pairs = 46314881
------------------------------------------------------------

LIBRARY k71 STATS:
################################################################################

MAPPING READS TO CONTIGS:
------------------------------------------------------------
      Number of single reads found on contigs = 6827738
      Number of read-pairs used for pairing contigs / total pairs = 657371 / 657371
------------------------------------------------------------

READ PAIRS STATS:
      Assembled pairs: 657371 (1314742 sequences)
            Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 440 +/-330): 645344
            Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 9033
            Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 0
            ---
            Satisfied in distance/logic within a given contig pair (pre-scaffold): 2175
            Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 819
            ---
      Total satisfied: 647519 unsatisfied: 9852

      Estimated insert size statistics (based on 645344 pairs):
            Mean insert size = 296
            Median insert size = 248

REPEATS:
      Number of repeated edges = 0
------------------------------------------------------------
################################################################################

SUMMARY:
------------------------------------------------------------
      Inserted contig file;
            Total number of contigs = 882414
            Sum (bp) = 762086901
                  Total number of N's = 0
                  Sum (bp) no N's = 762086901
            GC Content = 38.49%
            Max contig size = 55233
            Min contig size = 143
            Average contig size = 863
            N25 = 4488
            N50 = 2157
            N75 = 837

      After scaffolding k71:
            Total number of scaffolds = 882414
            Sum (bp) = 762086901
                  Total number of N's = 0
                  Sum (bp) no N's = 762086901
            GC Content = 38.49%
            Max scaffold size = 55233
            Min scaffold size = 143
            Average scaffold size = 863
            N25 = 4488
            N50 = 2157
            N75 = 837

------------------------------------------------------------
SSPACE minia • 3.9k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

This line in particular in the scaffolding summary:

Satisfied in distance/logic within a given contig pair (pre-scaffold): 2175

Means only 2175 read pairs were found to connect contig pairs. So out of the ~600k paired reads, most were mapped onto the same contig and only 2175 mapped on two different contigs and satisfies the distance criteria. That's probably not enough for SSPACE to establish any scaffolds, depending on your thresholds for establishing links.

Your Minia assembly was good in the sense that it was able to fill the gaps between most of your paired reads (645344/657371 pairs satisfied the distance within one contig). If you want longer scaffolds, you'll probably need mate pair libraries.

edit

It looks like you have ~6million single reads in your fastqs according to the summary report. Did you rename your fastq headers? Maybe SSPACE is not recognizing your pair-end reads correctly due to your header names?

ADD COMMENT
0
Entering edit mode

Thank you very much,

It is really bizarre that after trying different parameters, I still get this same result, I think even if it got only 2175 linking reads for scaffolding, SSPACE still can merge some contigs, isn't it?

I didn't rename any fastq headers, I checked them by head and tail command and confirmed they remain corresponding paired end. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6