How to reverse the order of **records** in a fasta file
4
1
Entering edit mode
3.6 years ago
Apex92 ▴ 300

This might be easy but I have not been able to solve it - I have a normal fasta file as below:

>seq1
ATCTAC
>seq2
AATCGCATCG
>seq3
ATATACAGC
>seq1
ATCGCGGGC
>seq4
ATTAATTTTAT

what I want o do is to reverse the order of lines in the file. So the desired output should be:

>seq4
ATTAATTTTAT
>seq1
ATCGCGGGC
>seq3
ATATACAGC
>seq2
AATCGCATCG
>seq1
ATCTAC

I tried using tail -r file.fa but then it does not work because ids get placed after the sequences.

How can I do this?

Thank you

sequence pipeline programming fasta • 2.6k views
ADD COMMENT
2
Entering edit mode

This is not reverse the order of lines in a fasta file but rather reverse the order of **records** in a fasta file.

ADD REPLY
0
Entering edit mode

Can you tell us why you want to do this?

ADD REPLY
0
Entering edit mode

I loop on the sequences and save their kmers in a list and if any of the reverse-complement of kmers exist in any of the sequences then that sequence should be rejected. With that being said- there is a very low chance for the bottom sequences to get past the filter since the number of kmers increases exponentially. With reversing the order I want to apply the filter on the bottom sequences first.

ADD REPLY
3
Entering edit mode
3.6 years ago
awk '/^>/ {printf("%s%d\t%s\t",(N>0?"\n":""),N,$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < input.fasta  | sort -t $'\t' -k1,1nr | cut -f 2- | tr "\t" "\n"
ADD COMMENT
2
Entering edit mode
3.6 years ago

Another way this can be done is to read the FASTA file from its end to its start.

Reading the file backwards avoids having to use hash tables or sorting, which can be expensive on very large inputs. This approach will be fast and use very little memory:

#!/usr/bin/env python

import sys
import os

ifn = sys.argv[1] if len(sys.argv) == 2 else None

def reversed_lines(file):
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks(file):
        for c in reversed(block):
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks(file, blocksize=4096):
    "Generate blocks of file's contents in reverse order."
    file.seek(0, os.SEEK_END)
    here = file.tell()
    while 0 < here:
        delta = min(blocksize, here)
        here -= delta
        file.seek(here, os.SEEK_SET)
        yield file.read(delta)

def main():
    ifh = sys.stdin if not ifn else open(ifn, "r")
    lines = reversed_lines(ifh)
    sequence = ""
    header = ""
    for line in lines:
        if line.startswith('>'):
            header = line.rstrip()
            sys.stdout.write('{}\n{}\n'.format(header, sequence))
            sequence = ""
        else:
            sequence = line.rstrip() + sequence
    ifh.close()

if __name__ == "__main__":
    main()

Usage:

$ ./reverse_fasta_records.py < in.fa > out.fa

Or:

$ ./reverse_fasta_records.py in.fa > out.fa

Edit: Modified this script to support the input file either coming from standard input or a regular file.

ADD COMMENT
4
Entering edit mode
3.6 years ago
5heikki 11k

Reverse the order of records of a fasta file:

tac -b -r -s "^>" in.fna > out.fna

Reverse the order of records of a fastq file:

tac -b -r -s "^@" in.fq > out.fq

These work even if there are linebreaks in the sequences..

ADD COMMENT
0
Entering edit mode

elegant solution.

ADD REPLY
0
Entering edit mode

Nice those tac options are new to me ! it will work in most/all cases. But you should never trust '>' as a separator :-)

https://nsaunders.wordpress.com/2014/08/14/looking-for-in-all-the-wrong-places/

ADD REPLY
0
Entering edit mode

Those examples show uses of > within the header line, but record separators appear to either use foo.startswith('>') or the regex ^> to look for the delimiter at the start of a line, no?

ADD REPLY
2
Entering edit mode
3.6 years ago
$ paste - - < test.fa | tac | tr "\t" "\n"                                                                                                                                           

>seq4
ATTAATTTTAT
>seq1
ATCGCGGGC
>seq3
ATATACAGC
>seq2
AATCGCATCG
>seq1
ATCTAC
ADD COMMENT

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6