Hi,
I'm currently working on alternative splicing and I have a fasta file in which, sometimes many reads taken as different genes come from the same locus due to alternative splicing. I need to group those sequences in separated single fasta files in order to align them further and get one single sequence per locus.
The fasta file looks like this:
>XM_010906229.1 PREDICTED: Elaeis guineensis probable strigolactone esterase D14 homolog (LOC105031930), mRNA
ATTTTTTTGAGCTAATAAACTCTCCAACCGCTATCAATATATAGTACCCCTACACATCCC
CTGCGGGGGTCACCCACATCATCATTATCATTTCACTCTCTCGTTTTGTCGGTGTGCCTG
CTTGCCATGTATAGGAACGTAAGGATCGTAGGGAATGGAGAGCAAACTGTAGTGCTCTCA
CATGGCTACGGTGGGAGCCAGTCGGTTTGGGACAAGGTGGTGCCGCACCTGTCTCAAAAG
>XM_010906230.1 PREDICTED: Elaeis guineensis leucine-rich repeat receptor-like serine/threonine/tyrosine-protein kinase SOBIR1 (LOC105031930), mRNA
CTTCCCATATGGATCCTTTCAATTCCTCTCCACCCTCCTCTCTCTCTACGCTTTCTAATC
ATCAACTCAGAACTAACGAAGGCCCAGCACCAACAAGACATCCCTCCATGGCCGATTTTC
TCTCTCATCCACTATGGGTCGGCGCTGCCATCTCTTTCTCCGTCGGCTTTGCCGTAGGTA
CCTTCATCTTCATCGTCTGGAAACTCGCCATCAGCCGCTGCCGTCGCATCCAAACCAACG
AAGAAGAACTCGCCAACACCCCCACCGTCTTCAGCCCCATGCTCAGGTCCAACCTCTCCT
>XM_010906231.1 PREDICTED: Elaeis guineensis leucine-rich repeat receptor-like serine/threonine/tyrosine-protein kinase SOBIR1 (LOC105031931), mRNA
AATCACCTCTAAATCAATTTCTCTTAAATTTTATGAGGACAATCAAGGAGAAAAAATAAT
GCATTAAGTACAAACATTCAAGTCTTCTTCAAGTAAGTATCATCGACAACAATCAGCTTG
TTAAGAAGCTCTTGGATTATCTAGTTGGACAACATAAATGCTATCTAAAATAATAATACG
GAAACCTATAAGACTTTCAAGTCGGGCTAAGGTGCTCTTCCTCATGTCTGACTGCCCCTC
CAGTTTTCTATAGTTGCATCCTTTTAACGTCAGGCCTATTATCAGGATTTTTTTTTCTTC
What I want is to store for example the two first sequence in a separated file from the initial fasta file if they come from the same location (LOC105031930). How can I do it ? Any inputs will be very appreciated, Thanks.
Please give an examples what the fasta headers are and in which way you would like them to be. On the other hand, it seems to me that you needed the fasta file of genes, but you have downloaded the transcripts. Try dloading the gene-table from your source.
Examples are in the original post (not sure if they were added after you wrote this comment).
Since these two sequences are from the same
LOC#
, OP wants them in a separate file.they are added after