Trinity, longest contigs
1
0
Entering edit mode
8.8 years ago
loly.pearl86 ▴ 30

Hi All,

Can some one explain to me how can I generate the longest and shortest contigs length from this trinity assembly summary:

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 306290
Total trinity transcripts: 369174
Percent GC: 48.47

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 5172
Contig N20: 3581
Contig N30: 2517
Contig N40: 1717
Contig N50: 1095

Median contig length: 333
Average contig: 659.77
Total assembled bases: 243568232

#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

Contig N10: 4276
Contig N20: 2665
Contig N30: 1655
Contig N40: 1006
Contig N50: 664

Median contig length: 313
Average contig: 540.34
Total assembled bases: 165499422

Thanks

Assembly trinity • 3.7k views
ADD COMMENT
1
Entering edit mode
8.8 years ago
iraun 6.2k

There are a lot of ways to do this. If you want a command line fast solution, this one can deal with your issue:

awk 'BEGIN{RS = ">" ; ORS = ""}NR==2{ min=length($2); max=length($2); next} max < length($2) {max=length($2)} min > length($2) {min=length($2)} END {print "Shortest: "min"\nLongest: " max"\n"}' file

Basically you need to extract the length of each sequence, and save the highest and lowest value. As I told you, there are a plenty ways to do it, in different languages, or different tools. In my opinion awk is great for this kind of situations.

ADD COMMENT
0
Entering edit mode

Thanks for u quick replay but I am not sure where can I run this script??! look like for Rstudio !! I am working on linux using (Putty) SSH. So can I run this script on SHH command line? If no, then can you please suggest to me other way which I can use it in linux

thanks

ADD REPLY
0
Entering edit mode

Yes. You can do it from the command line. Just move to the folder where your fasta file is located, and copy paste the command replacing file with your file name.

ADD REPLY
0
Entering edit mode

Thanks I did run it

its give me this result: shortest = 0 that is all !!

ADD REPLY
0
Entering edit mode

Sorry, I'd a little mistake. Can you try now?

ADD REPLY
0
Entering edit mode

Thanks its work now. But the results its unusual

I have 5 assembled samples , and the outcome of the script was

sample 1: shortest 1 longest 14357

Sample 2 , 3 ,4: shortest= 7 longest= 9

Sample 5 shortest=6 longest =89 Is this mean any things like my samples quality are poor or low quality??!!!

thanks

ADD REPLY

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6