Question

i got two different trinity.fasta by using two version of trinity

0

Entering edit mode

10.2 years ago

Kurban ▴ 230

Hi guys,

we have RNA-seq data sequenced of an insect in 2012, and assembled them by using one of the Trinity 2011 versions at the time (got the trinity.fasta) . now I analyzed the sequence length distribution in this file , and got the result as follows:

kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Downloads/gene.fa
stats.sh: 52: stats.sh: Bad substitution
stats.sh: 59: stats.sh: [[: not found
stats.sh: 59: stats.sh: [[: not found
stats.sh: 65: stats.sh: source: not found
stats.sh: 66: stats.sh: parseXmx: not found
A C G T N IUPAC Other GC GC_stdev
0.2875 0.2118 0.2067 0.2940 0.0000 0.0000 0.0000 0.4186 0.0894

Main genome scaffold total:          144777
Main genome contig total:            144777
Main genome scaffold sequence total: 67.067 MB
Main genome contig sequence total:   67.067 MB   0.000% gap
Main genome scaffold N/L50:          15033/1.075 KB
Main genome contig N/L50:            15033/1.075 KB
Max scaffold length:                 24.081 KB
Max contig length:                   24.081 KB
Number of scaffolds > 50 KB:         0
% main genome in scaffolds > 50 KB:  0.00%

Minimum  Number         Number         Total          Total          Scaffold
Scaffold of             of             Scaffold       Contig         Contig 
Length   Scaffolds      Contigs        Length         Length         Coverage
-------- -------------- -------------- -------------- -------------- --------
    All         144,777        144,777     67,066,997     67,066,997  100.00%
    100         144,777        144,777     67,066,997     67,066,997  100.00%
    250          56,929         56,929     53,670,774     53,670,774  100.00%
    500          30,137         30,137     44,518,044     44,518,044  100.00%
   1 KB          16,207         16,207     34,757,505     34,757,505  100.00%
2.5 KB           4,183          4,183     15,894,549     15,894,549  100.00%
   5 KB             588            588      3,942,668      3,942,668  100.00%
  10 KB              28             28        353,549        353,549  100.00%

in the file the min seq. length is 101; the longest one is 22181.

past several days I used the latest trinity version- trinityrnaseq-2.0.6, assembled the same raw data again(after low quality reads teamed of course). this time the length distribution of the file is as follows:

kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Desktop/data_from_server/2015_6_04_assembled_CD_and_CK/Trinity.fasta
stats.sh: 52: stats.sh: Bad substitution
stats.sh: 59: stats.sh: [[: not found
stats.sh: 59: stats.sh: [[: not found
stats.sh: 65: stats.sh: source: not found
stats.sh: 66: stats.sh: parseXmx: not found
A C G T N IUPAC Other GC GC_stdev
0.2932 0.2083 0.2114 0.2871 0.0000 0.0000 0.0000 0.4197 0.0823

Main genome scaffold total:          56130
Main genome contig total:            56130
Main genome scaffold sequence total: 57.963 MB
Main genome contig sequence total:   57.963 MB   0.000% gap
Main genome scaffold N/L50:          9036/1.861 KB
Main genome contig N/L50:            9036/1.861 KB
Max scaffold length:                 30.733 KB
Max contig length:                   30.733 KB
Number of scaffolds > 50 KB:         0
% main genome in scaffolds > 50 KB:  0.00%

Minimum  Number         Number         Total          Total          Scaffold
Scaffold of             of             Scaffold       Contig         Contig 
Length   Scaffolds      Contigs        Length         Length         Coverage
-------- -------------- -------------- -------------- -------------- --------
    All          56,130         56,130     57,962,594     57,962,594  100.00%
    100          56,130         56,130     57,962,594     57,962,594  100.00%
    250          50,921         50,921     56,731,956     56,731,956  100.00%
    500          29,025         29,025     49,248,962     49,248,962  100.00%
   1 KB          18,003         18,003     41,494,038     41,494,038  100.00%
2.5 KB           5,541          5,541     21,499,015     21,499,015  100.00%

In this second trinity.fasta file the min sequence length is 224; the longest one is 30733.

My questions are:

Why two assembly results are different,e.g. the former version of trinity assembled lots of sequences in length range from 101 to ~200 ? but the minimum length of the assembled sequence by using latest version of trinity is 224?
Which trinity.fasta file should I use in the following analysis process ? and why?

Could u please give me little bit detailed explanation ?!

Thanks

trinity • 2.4k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Kurban ▴ 230

score 3 · Answer 1 · 2015-06-06

3

Entering edit mode

10.2 years ago

h.mon 35k

So much changed between versions, it would be surprising if you got instead similar assembly results.

If you really want to know which assembly to use, you should evaluate both to decide - but if you want an easy answer, use the latest Trinity assembly.

ADD COMMENT • link 10.2 years ago by h.mon 35k