getting fasta sequences(proteome) from a file referencing another fasta file (tf)of the same organism
1
0
Entering edit mode
8.7 years ago
kws15 ▴ 40

Hi everyone,

Basically I have 2 large fasta sequences file, the first one is the proteome fasta sequences (all the protein sequences), the second one is the transcription factor sequences fasta file of the same organism, I am just wondering if there is any way that I can extract the non transcriptional sequences as a fasta file using these two files? Many thanks

fasta • 2.5k views
ADD COMMENT
1
Entering edit mode
8.7 years ago
natasha.sernova ★ 4.0k

As far as I have understood, your task is the following:

you have two fasta-files. One of them contains all the proteins of your favorite organism,

the second file contains only transcription factors from the same organism.

You need to select proteins from the whole proteome-file

that are not in your second fasta-file with transcription factors, is it correct?

Look at the following post

A: Print Different Id From Sequence Comparison Of Two Fasta Files

There are different scripts on different languages, you will definitely find something suitable to you.

For example, bash comand-line function diff, it's OK for your problem in my opinion.

Or perl solution with a hash of "unseen" proteins.

http://www.geos.ed.ac.uk/~bmg/software/Perl%20Books/OReilly.Perl.Cookbook.pdf

Chapter 5.11 in the Cookbook. "Finding Common or Different Keys in Two Hashes".

So you can use whatever you prefer.

ADD COMMENT

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6