Biostars, I'm here to request support for the following situation.
In general, to make it easier because I'm a beginner in programming, I have written this script based on Biopython
from Bio import SeqIO
#Get the lengths and ids, and sort on length
len_and_ids = sorted((len(rec), rec.id)
for rec in SeqIO.parse("mycobt.fasta","fasta"))
ids = reversed([id for (length, id) in len_and_ids])
del len_and_ids #free this memory
record_index = SeqIO.index("mycobt.fasta", "fasta")
records = (record_index[id] for id in ids)
SeqIO.write(records, "sorted.fasta", "fasta")
This code is working correctly and as a result it sorts the sequences from smallest to largest, including the nucleotides in the following way:
>Myco_seq3
ATCCAA
>Myco_seq4
GCCTCATCTC
>Myco_seq2
ATCGGTCATCCGATAA
>Myco_seq1
TCCAGCCTTCCTCATGGCTCCCA
Instead, what I really want is that the result only shows me the ID of the fasta with its respective length in number from smallest to largest, as in the following example:
> Myco_seq3 | 6
>Myco_seq4 | 10
>Myco_seq2 | 16
>Myco_seq1 | 23
I've looked through the forums, but only see ones where people filter by specific length. Could someone help me here? I would highly appreciate your contribution and support.