Hey all,
Here's a laydown of my information.
We sequenced a transcriptome. The company returned to us 4 different reads of that same transcriptome.
We ran FastQC and trimmed all 4 paired reads individually. We had some contamination issues, so we ran DeconSeq on all 4 reads.
We ran through the velvet+oases pipeline using Kmergenie for our k-mer count.
We concatenated the final oases outputs into one fasta file. This contained 4 repeats of our transcripts.
We ran through the Velvet+Oases + kmergenie pipeline again, this time utilizing the "merged" option.
We took the single output from Oases and ran it against CD-HITs to merge the similar sequences together.
Now, we have a final fasta file containing around 37,000 transcripts.
How can I obtain the contig stats for this final file? I've read countless papers which outline the following core-information:
Contig Number
Maximum Contig Length
Minimum Contig length
Average Contig Length
N50 Length
Number of Reads per contig
I looked at the "stats.txt" file from Oases, but nothing is given in this format.
How would I go about generating that info?
Thank you.
Dear all,
I ran the
./oases
and generated Transcript.fa. May I know the statistics of Transcripts.fa as we got in Trinity. If yes then guide me how it will calculated.