Question

getting fasta sequences(proteome) from a file referencing another fasta file (tf)of the same organism

0

Entering edit mode

8.7 years ago

kws15 ▴ 40

Hi everyone,

Basically I have 2 large fasta sequences file, the first one is the proteome fasta sequences (all the protein sequences), the second one is the transcription factor sequences fasta file of the same organism, I am just wondering if there is any way that I can extract the non transcriptional sequences as a fasta file using these two files? Many thanks

fasta • 2.5k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 8.7 years ago by kws15 ▴ 40

score 1 · Answer 1 · 2016-04-15

As far as I have understood, your task is the following:

you have two fasta-files. One of them contains all the proteins of your favorite organism,

the second file contains only transcription factors from the same organism.

You need to select proteins from the whole proteome-file

that are not in your second fasta-file with transcription factors, is it correct?

Look at the following post

A: Print Different Id From Sequence Comparison Of Two Fasta Files

There are different scripts on different languages, you will definitely find something suitable to you.

For example, bash comand-line function diff, it's OK for your problem in my opinion.

Or perl solution with a hash of "unseen" proteins.

http://www.geos.ed.ac.uk/~bmg/software/Perl%20Books/OReilly.Perl.Cookbook.pdf

Chapter 5.11 in the Cookbook. "Finding Common or Different Keys in Two Hashes".

So you can use whatever you prefer.