Question

grep sequence

0

Entering edit mode

3.5 years ago

saadleeshehreen ▴ 140

Hi,

I have a fasta file with sequences like the following. The pair of sequences have a similar header. I want to generate a file with the sequences which have a header with no "shuffled". How to do that in bash?

>AABR03119176.1/72910-72785
UCCCCCAGAGUCUGGGCUUGGUGCUUUGCAGUGCUGGCGACCUAUUCCCUUUGACGAUCCCUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAGGGGAAUAGCUCACCGCCACUGGGCAACAGGCCUA
>AABR03119176.1/72910-72785-shuffled
CCGCUAGCGUGAUUGGGGACGGGAUCGACCGGUGGCCCGCCGACGCCUCACCUCAUACUCGUAUGUGAUGCCGAGGGCUAGGUAAGAUGGUUGAACGCUCUAGAGUGCCCUCUGAACUUAGCCUCU
>AANN01820944.1/1549-1423
UUUCCCUCAGAAUAGGCUUGUUGCUUUACAGUACUGGUGAUCCAUUCUCUUUGAUGAUCCCcUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAAGGGAAAGACUCAUCAUCACUGGGCAACAGCCUUA
>AANN01820944.1/1549-1423-shuffled
AGGCUCUGACAUAGACUCUUCUUUAGUGGGCGCGCCGACACAUACCUGUcUGAGGAGAUCGAAAUGUGUAGUCCGACAGAACUAAACAAGACUCGUCGGUGCUUAGACUUCUUUCCUGUUUGCGAUU

grep • 1.0k views

ADD COMMENT • link updated 3.5 years ago by Ram 44k • written 3.5 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

try these:

$ sed '/^>/ s/-shuffled$//' test.fa or

$ awk -F "-shuffled" '{print $1}' test.fa or

$ awk -v RS=">" -v OFS="\n" 'NR>1 {sub("-shuffled$","",$1); print ">"$1,$2}' test.fa.

But you will have sequences with identical headers. Somewhere else, this could be a problem.

ADD REPLY • link 3.5 years ago by cpad0112 21k

Ram · Accepted Answer · 2021-05-17

1

Entering edit mode

3.5 years ago

lieven.sterck 15k

cat <yourFile> | paste - - | grep -v 'shuffled' | sed 's/\t/\n/g' > new_file

cat your file, put header and sequence on one line (paste) , grep all lines that do not match 'shuffled' (grep -v ) , put data back in two lines header+sequence (sed)

ADD COMMENT • link 3.5 years ago by lieven.sterck 15k

0

Entering edit mode

as an additional note I want to add that I provided a working solution here but that you could have found this yourself doing some searching as this has been asked/answered a number of times before.

ADD REPLY • link 3.5 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks but an Error is given. How to solve it?

sed: -e expression #1, char 6: unterminated `s' command

ADD REPLY • link updated 3.5 years ago by Ram 44k • written 3.5 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

apologies for that, it was missing a trailing /, fixed it in the cmdline above

ADD REPLY • link updated 3.5 years ago by Ram 44k • written 3.5 years ago by lieven.sterck 15k