Dear all,
I am trying to compare my result form different de-novo transcriptom assemblers, but I am not sure how to do that. If I understand right - output from Trinity does not provide scaffolding - so I have to compare just my contig lengths from my outputs.
But in Trinity default output (Trinity.fasta) has together transcript isoforms and gene isoforms - should I separate just transcript isoforms from Trinity.fasta and count statistic (average,min,max contig length) to compare with Velvet/Oases? Or can I run awk script bellow to compare assemblers for whole Trinity.fasta (together transcripts and gene isoforms)?
awk script is:
awk 'BEGIN {flag=0;
print "Contig ID\tContig Length \tA \tT \tG \tC \tN \tOtherCharacters "}
{if ($0~/^>/) {if (flag==1) {tot= aCount + tCount + gCount + cCount + nCount + xCount; print id "\t" tot "\t" aCount "\t" tCount "\t" gCount "\t" cCount "\t" nCount "\t" xCount;}
id=$0;flag=1; aCount=gCount=cCount=tCount=nCount=xCount=0;}
else{aCount+=gsub(/[aA]/,"A",$0);tCount+=gsub(/[tT]/,"T",$0); gCount+=gsub(/[gG]/,"G",$0);cCount+=gsub(/[cC]/,"C",$0);nCount+=gsub(/[nN]/,"N",$0);xCount+=gsub(/[^ATGCNatgcn]/,"X",$0);}} END{tot=aCount + tCount + gCount + cCount + nCount + xCount; print id" \t "tot" \t "aCount "\t" tCount "\t" gCount "\t" cCount "\t" nCount "\t" xCount;}' Trinity.fasta
Thank you for any explanation how to compare outputs.
Please reformat the script to make it more readable.