Hi, I have this problem. I have a fasta file containing the whole genome of a arabidopsis plant. I also have a list of genes I need to extract the sequences from the fasta file. Each line contains 2 or 3 or 4 sequences. I need to create a fasta file for each of the sequences in the each line. Any help? I visualize it for simplicity.
sequences I need to get fast for:
AT1G63760 AT1G05890 AT2G31510
AT3G30122 AT1G08410 AT2G27200
AT1G56390 AT1G09210
AT1G18191
AT3G32445 AT1G18500 AT1G74040
AT2G34760 AT1G20010 AT1G75780 AT1G74040
AT1G60480 AT1G23490 AT1G70490
AT4G10414 AT1G33230 AT4G10430
AT4G06748 AT1G52410 AT3G15950
AT2G06822 AT1G52930 AT3G15460
AT3G43820 AT1G62810 AT4G12290
AT2G11280 AT1G63640 AT5G41310
AT2G15120 AT2G15090 AT4G34250
>AT1G63760
ATGATGGATTCCGATGATGATATGCTCGATGCCCACGATATGGACTCGGTAGATTATGATTTTGACAGCGGCGGCACCGATGATGACAAC
>AT1G05890
GCAAAAGTCCATATATCTTTCTCTCATCTACTCGATTTCTGAATCGCGAACGCAGCGAGC
GATCCGGAATCGAGAGAGAGAGCTACCAATTTTTCCAACTTGTTCGGTTCCTTATAAAGC
TGTTTACTTTCCCTTGCGAATTCTCTCTCTCTCTACTATAAAGCAATACCCTTTTCTTTC
TTTTTTGGTAACCCCCAAACCCTACTTTGTCCAGCGAGAAAGGAAGAGGGGTTTCGTCTG
>AT2G31510
ATGGATTCTGAAGAAGACATGCTCGATGCGCACGATATGGAGTCTGGAGAGGATGATTTC
TACAGCGGTGGAACTGATGATTGTAATGATAGTGATGATGGTGAACCTGATTATGGGTTT
GTTGAGGAAGATGCTGATGATTCTGCTATGATCGCCTCTCATCGCTCTCAGGTGGGTTTT
TGTTTTTGTTTCTTAATTCATTTTGGTGTTGGTTGTTGAAGCTAATTGCTTTTAGAACTC
>AT1G18191
AAAATTGAAGACGAAGAGTTTAAGACTCTATCAGATCAGAAGGTCTCTATTTTCTCGAAACGTCGGCTTTAGAGAACCAAAACGTC
>AT1G56390
GCAATGGTGAAACTAGACTCTAAACTCATCTCTATGATTGTTTTCGGTATCGTGGTAATCGTCTCTGCTG
>AT1G09210
GCTAGCTCCTCTCCTCGCGGTATATATAAGCTCCAGGTCTTGTACATCTTCATCATCTGA
TCTCGGGGAAGCTCCGATCTGAGTTTTTTTTAGCAATGGCGAAAATGATTCCTAGCCTCG
TCTCTCTAATTCTTATCGGTCTTGTTGCGATCGCCTCCGCCGCAGTTATTTTCGAGGAGC
GCTTTGATGGTATCTAATTTCTACATCTCTATCTCTATACTCTATCTTCCTGATGATGCG
>AT2G34760
ATGCGTAAACAAAGTTTTAAGATAGTTATGATTCGTTTTTTGAGAGTCAATAACAAAAATTATCCGATC
>AT1G20010
AACAGAGACAATTGGTTATATTAGCTGTCACTCCCATCTTTCATATTCCTTCACCATCTC
TCTCTCTCTCGATCTTGTGAACCACTACACACACTAACACAATGAGAGAGATCCTTCACA
TTCAAGGTGGTCAATGCGGTAACCAGATTGGTTCCAAGTTCTGGGAAGTCATCTGCGACG
AGCACGGCATCGATTCCACCGGACGTTACAGTGGAGACACTGCAGATCTCCAGCTTGAAC
GTATCAATGTCTATTACAATGAAGCTTCAGGTGGAAGATACGTTCCTCGTGCTGTTCT
>AT1G75780
ATCTCCAGATCCCAAAATCTTCATCGATCATCATCATCATGAGAGAAATCCTCCACGTCC
AAGGCGGCCAATGCGGTAACCAAATCGGTTCCAAATTCTGGGAAGTTATCTGCGACGAAC
ACGGCGTTGATCCCACCGGACGTTACAACGGTGATTCCGCCGATCTTCAGCTCGAACGTA
TCAATGTTTATTACAATGAAGCTTCTGGTGGTCGTTACGTTCCTCGTGCTGTTCTCATGG
>AT1G74040
AAAGTAGTAACCAGAGACACTGTGCCGTCGCCCGTCGCCGCCGCCGCCACACTATCATCT
CTCTCAGGTTTTTGATTTTCCACGGCAATGGAGTCTTCGATTCTCAAAAGCCCTAATCTC
TCTTCACCATCGTTCGGTGTACCTTCAATTCCCGCCTTATCCTCCTCCTCCACCTCACCA
TTTTCATCTCTTCATCTCCGATCACAGAACCACCGTACCATCTCTCTTACCACCGCCGGA
AAATTCCGTGTCTCGTATTCTCTCTCCGCTTCTTCACCTCTACCACCTCATGCTCCTCGC
CGTCGTCCCAATTACATCCCTAACCGTATATCCGATCCCAATTACGTCAGAATCTT
What have you tried so far? Have you looked at the related questions on the right?
Hi have script pick up a sequence per line. Not sure how to modify it so that I can have many sequences per line picked up and then saved in many files each file for 1 line of sequences.
!/usr/bin/perl -w
use strict;
my $idsfile = "gene1.txt"; my $seqfile = "TAIR9seq.fasta"; my %ids = ();
open (OUT,">selectionseqs.fasta");
open FILE, $idsfile; while([?]) { chomp; $ids{$_} += 1; } close FILE;
local $/ = "n>";
open FASTA, $seqfile; while ([?]) { chomp; my $seq = $_; my ($id) = $seq =~ /^>(S+)/; if (exists($ids{$id})) { $seq =~ s/^>.+n//; $seq =~ s/n//g; print "$seqn"; print "$seqn"; print OUT ">$idn$seqn"; } } close FASTA;