Hi all,
First time post and relatively new to bioinformatics but hoping to find a solution to my problem.
I am trying to write an awk script that input a fasta file containing a set of very similar sequences, some of them are from the negative strand while others are from the positive strand and hoping to output these sequences in the same direction. I know the direction of the strand is from the positive strand if the 9th position is "G" which if matched, would then replace the sequences with the reverse complement.
I dont have much as of yet as I thought i could pipe the output of Awk to revseq but I was unsure how to keep the headers
awk -F '' '$9 =="G"' | revseq
As a basic example: (note the headers of each sequence do begin with a >)
seq1
ACT
seq2
ATG
seq3
ATT
If 3rd position = "T" replace sequence with the reverse complement. so output would look like
Output:
seq1
AGT
seq2
ATG
seq3
AAT
Just a side note: It's complement, not compliment.