problem with filtering "Sequence unavailable"
0
0
Entering edit mode
8.0 years ago
ashkan ▴ 160

I have a file like the small example: small example:

>ENSG00000004142|ENST00000003607|POLDIP2|||2118
Sequence unavailable
>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

but I have too many "Sequence unavailable". I want to get rid of those transcripts. and the results would be like this:

>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

I tried to filter out those parts in bash but

grep -v "$(grep -B 1 "Sequence unavailable" file.txt)" file.txt

but gave this error:

Argument list too long

how can i filter out them in bash or python?

sequence • 1.9k views
ADD COMMENT
0
Entering edit mode

How about (should work as long as the first record is Sequence Unavailable, you can be creative otherwise): grep -A 2 "Sequence" your.fa | grep -v "\-\-" | sed -n '/Sequence/!p' > new.fa

ADD REPLY
0
Entering edit mode

It would be nice to provide feedback to the proposed solution of genomax2. In addition, you have more questions which you left "open/unsolved" after people tried to help you. That's not respectful.

I pledged to help you on your previous thread, but my questions remain unanswered, although it's clear that you have been active multiple times on biostars since my comment. You shouldn't take our help for granted.

ADD REPLY
0
Entering edit mode

Dear ashkan, please respond to questions/give follow up comments on your past posts. Abandoning a question after you ask it borders on troll-like behavior. Unless you follow up on your past questions, your future questions may not be taken seriously or your posts may be treated even more sternly.

ADD REPLY

Login before adding your answer.

Traffic: 1991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6