Help with seqkit sort increasing file size in multifasta
0
0
Entering edit mode
8 hours ago
FJCF • 0

Hi everyone, I need to sort a big multifasta file (around 160GB) using the seqkit sort option and I've noticed that the sorted output is heavier than the input. Does anybody know why could this happen? I've used seqkit sort -N.

Thanks in advance!

The sorted file has a bigger file size

fasta multifasta seqkit • 147 views
ADD COMMENT
1
Entering edit mode

Does the input file contain long sequence lines while the sorted uses fixed width chunks?

ADD REPLY
0
Entering edit mode

I've checked it and both files have a multiline sequence with a fixed size with this structure:

sequence

ADD REPLY
1
Entering edit mode

please paste the result of 'seqkit stats' and 'seqkit sum' with the two files.

ADD REPLY
0
Entering edit mode

If the width is the same on both files the number of lines should be the same right?

ADD REPLY
1
Entering edit mode

I've noticed that the sorted output is heavier than the input

Never use file sizes as a criteria for any QC/comparison other than in a qualitative way. e.g. is a file present. Is it zero bytes or does it contain stuff.

Perhaps this also applies in your case: https://askubuntu.com/questions/796947/why-is-my-sorted-file-bigger

ADD REPLY
0
Entering edit mode

For questions or bugs of a specific tool, asking the author is also a good way: https://github.com/shenwei356/seqkit/issues

ADD REPLY

Login before adding your answer.

Traffic: 2797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6