How to remove these a subset fasta file
1
0
Entering edit mode
3.7 years ago
CHINMAYA ▴ 10

so i use and run my fasta file in my terminal using orfipy like:-

orfipy normal.fasta --dna orfs.fa --min 300 --max 10000 --table 1 --outdir orfs_out

$cat orfs.fa 

>MSTRG.942.2_j_ORF.1 [85-439](+) type:complete length:354 frame:2 start:CTG stop:TAA
CTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT

orf.fa is the subset of original fasta file, so after analysis i want remove these subset fasta file from my original file

any suggestion how to do it?

orfipy • 1.8k views
ADD COMMENT
0
Entering edit mode

i physically check it is not working because may be header is differenet in orfs.fa and normal.fasta

$cat normal.fasta

>MSTRG.942.2_j
GAGATTAAGTgcttcttaaatataaaatgagttaATACATTGAATTGTGTCGGGGTGATTCTTTATATAGAGTAGTACTGTCCTACTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCATCTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATATATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAAGCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTTCCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTTTTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCTTAAGGAAAAAGCTACCGTTTAATAAGtagtataattaataatgaattaaatgcATCTAGTTGTGATCTTGGTATGTTTACATGAATAAAATTCTAATTATATAGGTgaagatttacatttttggtgctCATGCTATGACTATAATTATGATTTTGACTAATATGCATAAGATTGAAAAAACATGAGGTCAAATTCTACAACACCGCCTTTGCAAGTTGCAACCTCACACTTTACCTCCCACATGAATTGTAAATTGCGATGTTTCATATTTTCTTCTTAAGATTTGTCtgttttatcttttttagtTTACAATGTTAATTATTCATCCTGTAGGCAATGATTGCTTTGTACATGAGTGTTTAATAACTTCAATTTGATCGTAATGGAAAGCTCAATTTGATTTATTCTGTTAATAACTTTTATATCTATATTGCAggtttaatgcaaaaacaaaacaaaagccATATTATCAACAACTTTCAGTCTGATCTTATTACCATAATATCTCAGATCTAGTCATGTGCTTTGAAATTGATTTCAGCAATGATATTTTTTCCCGCAACACAAATCATTGTAGGaataaattttctttcaaaagcTAATGAGATAATCAGCTTACCTAACAACCAAAAGTAAAATCTCAACAATATGCTCCATATATATTAGAGTTGATGAATGTTATTGAAGTATAACCAGGAGTTTTGTTTTACTACTTTGTTAAAGCATAAATTAAGTCAACTTCCAAACTACCAGTTCTTTGGTTTGAATGTTGATCCAACTTCTACTTCAATAAAGTACTAAAATAGAGCTCTTTGCTGCACATCATCAACAATAATGTATCCGACATATTGGTGGAATTTCACTTTTGGTTGTTTAGGTAAGTTAATTATCTTATCAGCTCTTGAAAGAAAACGTATTCATACAATGATTTGCGTTTCGAGAACAAATATCATTGTTGCAACTAGCACATGATTGAGTCTGAGATATAATGGCAATAAGAGCACCTCCAACAAAGGTTCACGATGAGTTGTTAGAATGGAAAGGCAAGTTCTTTGAGAGGTTAACTGCAGGGTATGAAAGTGACGTGGCACAGACTAGTTCACCATGCTTCAGGTTTGAGATCATGTTTTTAGTGTAGTTTCTTTATTTAAGTCCTGGTATTTTTTGATCCTTCACTTTGTATTATGTATGATCAGAATATTCTCAATAATTCATCAGAATGTTGTTTAAATTAACTATTGTTTATGCGTTGATCAAAAGTTCTTCTTTCATTTGCCTATTTGCTAACTACCACTATTGTAAAATCTCTATAGCTTACATTTTGTTACAGGATGAAGATCATTTTGAATGATGATgagttttttgtgttttttcatGTGGATGATGAAatgatgagttttttttttttttttgcgaaGGTTGAGTCTCTTATTGTCATGTATTTAACCATTCACATGTTGACAATTAATTTTTGGATTGGTCATGTATTATCAGTTCCTATCAAAGTGAGATGCAATGACTCGTAGGCTGCTTTCAAATGAAAAGAACTGAAAAAATAGGTACATTTGATATCAAGTACTTAGATAGTTAGATTATGACTCACAAATTGCGATTGTTTGTGTTAGAATTTTGAGAATCTCAGTTTTGTCCGTGATGCAGAGTTGCAGCTGCAGGAGtgcaataatatttcaaaatcatcaaGAAACAACATAAATGATAATGGGTGCATGACATGATGTCCAATTTTTGTAGCTCATGAGGGAGTTGCAAATTGTTGTACtgtttttgatttattatttaggTTTATGGTCTTTAGGAAGATACTACAAGTGTTATGAATCAATATCATATGTATGATTCAAAAGCAAAACAAGTTGATAccataatatcatttttaaggattaaacacaaataaaacaaaCTTTAATTGAATCTTCCACTAAGCTTGATCTAAATGAACCTTTATAAAAGAAGGGTATAATTTTATAGAACATGACAGTAATACTTTGTGGACACATGATTTACTTTTTTATCATCAGAAATTGAACTTACTTCCACCCCACCACCTAGGAGGTTACAACACTCACTTTATACTCCAACGTGATCATATtgtttaatgataataatagtTAGTAAGTACATTTTGACTATCTTACAATTA

so any method to remove header part like( _ORF.1 85-439 type:complete length:354 frame:2 start:CTG stop:TAA ) in orfs.fa?

ADD REPLY
0
Entering edit mode

See if this works:

$ cat your.fa | sed -e 's/_ORF.*//'
>MSTRG.942.2_j
CTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ADD REPLY
0
Entering edit mode

wanted.fa gives same sequence in normal.fasta so any help

ADD REPLY
0
Entering edit mode

See the following. You need to keep in mind that MSTRG.944.3_j is no longer unique so it is going to remove multiple sequences which have that id. I took the example below from the other thread you had posted this morning.

$ more n.fa
>MSTRG.942.2_j_ORF.1 [85-439](+) type:complete length:354 frame:2 start:CTG stop:
TAACTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT

>MSTRG.944.3_j_ORF.1 [162-489](+) type:complete length:327 frame:1 start:ATG stop:
TGAATGCCGACGTACAAGATTAGGGGAATCGACGTAGATTTTCCCTACGAAGCCTATGATTCCCA
ACTCGTTTACATGGACAAAGTCATGCAATCGCTTCAGGAGGTAGCGATTGACTCACTCAATC
ATTGCACTTTTGATTATTTAAGctacttttgatgtatttttattttattttatggtagTTCC
GCTGTGGTTGTTGTAATAATcgactaataattaattataaacatgATTTTGGATCAATTGGA
AGTGATCacaaaatgttaatatttaCTTGTTGTCAGGCAATTTGAAATTGATGTTGTTAAGA
TCATGATTGATCAGCAG

>MSTRG.944.3_j_ORF.2 [3141-3549](+) type:complete length:408 frame:1 start:TTG stop:
TAGTTGAAAATTTGTGCTGTATATTTTCTGCTATCTCGGGTACAGGatgatattttctttattgc
agCACTTCTTCTAAAACTTGAAAAGCGCATTGCTGAGGTGCATATTGAATCTAAGGAGTTGG
GGTTTACTAAACCCGGGCCCTATATGTTTGAACTGCTTGCTGATCTTAATATCACTCACAAG
ACTGCTTCTAAGCTTAAGAGTATAATAGCTGAAGCTTCAACTCTCATTGAGGAAAATAATCA
GGAGAAATCAACTGGCACCATCTGCAGATTGGATACTATCAAGGATATTCTTGACATTGTTT
TCAGGGATGGAAGAACTTCTCATGCTAAATACTATCGTGTAAGTTTTGAATTATCGTTTACA
CTTCAGTGGATTGATTTTGTTTGTCTTGTTGCTTCC


$ cat n.fa | sed -e 's/_ORF.*//' > new.fa

$ more listfile # This file should contain only fasta header remove `>` from header
MSTRG.944.3_j

$ ./faSomeRecords -exclude new.fa listfile wanted.fa

$ more wanted.fa
>MSTRG.942.2_j
TAACTGTTGAGTATAGATTCCTTTTTCACTCAgtaagcaaaaaaaaaagtagatctGAAACCCAT
CTTTCTATCAAGAACCCCCAGCTCCATTTCCACGCCCCATCTCCGGCCTCCGCGACACATAT
ATCCATTTTCGTGCTCTTCATCTCCTAAGCTTTCATTTGAACCGAATAAATCAACTTTTGAA
GCAACTTCGTGGTCAACCCATTTTCTTCCCTTCCCGGTAATACTCTTTTTCCGGTCACCTTT
CCTCTTTTCTTCCTTCTCTCTTACCtatttattccttttttttgtctttcaaaACTGTAGTT
TTTgtcctttttattttcttcttctagaTGCATTTTTTATTCCT
ADD REPLY
0
Entering edit mode

if i converted two fasta file into txt and use

diff a.txt b.txt|grep ">"|cut -c 3- > foo.txt

can it work?

ADD REPLY
0
Entering edit mode

Fasta files are plain text. I demonstrated how to make this work (with your own data) in example above.

ADD REPLY
0
Entering edit mode

it is normal confussion in my mind like

in FASTA sequence if one header file contain one or two ORF from some postion to another, so i delete whole header or that position only?

ADD REPLY
1
Entering edit mode
3.7 years ago
GenoMax 147k

Use faSomeRecords utility from Jim Kent (Linux version linked). Add execute permissions after downloading (chmod a+x faSomeRecords).

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.

So you can do

faSomeRecords -exclude normal.fasta orfs.fa wanted.fa
ADD COMMENT
0
Entering edit mode

i download and give the permission but after running it say

faSomeRecords: command not found
ADD REPLY
0
Entering edit mode

If the program is in current directory where you have all files then use

./faSomeRecords -exclude normal.fasta orfs.fa wanted.fa
ADD REPLY

Login before adding your answer.

Traffic: 2214 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6