Dear all,
I have a fasta file with the format as follows:
>1-698675
TAATACTGCCTGGTAATGATGACT
>2-69532
TACTGCCTGGTAATGA
>3-6954
ATACTGCCTGGTAATGATGACT
The header >1-698675 indicates the read ID is 1 and the read count is 698675.
What i want to get is the length distribution of all the reads (eg, how many reads have the length of 17, 18,19, etc.). But as you see, the reads are collapsed so the available script I found online cannot apply to this fasta file. I appreciate if someone can provide the script to calculate the length distribution of this kind of fasta file.
Many thanks.
What you mean by collapsed? Do you mean that the entire sequence is in one line ? Also, do you want to get the distribution of read count or the length of the reads. I am confused why you told us about read id and read count.