Entering edit mode
5.4 years ago
fec2
▴
50
Hi,
I have a multi fasta file that contains 453 fasta sequences that looks like the following:
>M.Bce12308ORF4755P GTWWAC
ATGCGTGACCTGATCGAAGAGCCGGGCGGCGGCGCCGCGAGCGAGGCGGAGGCGGTTCAGCCCGCCGCTGCCGTGCCGCGCGCGCTGCCGTCCGGTATCG
>M.Bce1254ORF9725P GTWWAC
ATGCGTGACCTGATCGAAGACCCGGGCGGCGGCGCCGCGAGCGAGGCGGAGGCGGTTCAGCCCGCCGCTGCCGTGCCGCGCGCGCTGCCGTCCGGTATCG
And I have sequence name list that contains 461 name as below:
M.Bce12308ORF4755P
M.Bce122ORF1082P
M.Bce12308ORF4755P
M.Bce1254ORF9725P
May I know how to match the name list to the fasta file, so that I can know which of the sequence from the name list is missing in the fasta file?
Thank you!
Felix
Did you try searching the forum at all? This is one of the most widely addressed problems here.
Hi, I have tried searching it, but couldn't found exactly same issue, as what I want is a list of name of the missing fasta file. Do you have any idea what is the key word should I use to search for this issue? Thanks.
Extract reads from fasta file (specific read_names)and make a new fasta file
Extract fasta sequences from a file using a list in another file.
Filtering sequences from a Fasta file using Biopython
Pulling out gene names from a FASTA file with a list of IDs
Extracting specific IDs + sequence from multifasta
How to extract specific genes from a fasta file
Printing specific sequence and ID from combined fasta file using bash commands
Extract Sequence From Fasta File Using Ids From a separate txt File in linux
....
Although I highly doubt you could not find the exact same case (I recall addressing multi-part sequence identifiers and how to deal with them a few years ago), this approach is not helpful. What you need is not something that you can copy-and-paste and "just works" - such solutions are rare and don't teach us anything. You need a "pointer", which is a hint that takes you one step closer to a solution than you are right now. That way, you get to solve the problem yourself while overcoming an obstance that might have taken quite some time to solve on your own.