I have taken out exon sequences from genome file using the an awk script I get output like this as small subset ,now I have like 12 PDCD4 exon ,so in the file i might have single or multiple exon of all the genes ,now the next step is I have to make a single exon using the multiple exon like a single PDCD4 , then i have to join the 5 prime end of the gene to the 3 prime end of the complete sequence removing the middle part ..so I join the first PDCD4 to the last PDCD4 where
How do i do that... So first one is find the common exon if its its single exon then no issue Then if there are multiple exon then i have to join those exon into a common exon that multiple exon into a single exon Next I have to join the 5 prime end to the 3 prime end removing the middle part of the single exon sequences .
I would be glad if i get some help how to proceed
>PDCD4
CTTTTCCTCCTCAGCTCCGGCTCCGCCGCCACGATTGGCCAGCCGACCACCCGGCCTCGGCCAATAAGCGCCGCCCTCTCGCCCCCGTGTTACTGGGTAGAAGAAAACAAAAACAAACAGAGCGAGAAGGGCCAGAGACTCTCCGAGGCGGCGGCAGAGACAGAAGAGCGGGGTCGGGGCCGGCTGACCAGGAACCTGGGCGAGCAGCGGCGGGGGCCCGAGGG
>PDCD4
ATTCTGAAGGAAGATTTCCATTAGGTAATTTGTTTAATCAGTGCAAGCGAAATTAAGGGAAAATGGATGTAGAAAATGAGCAGATACTGAATGTAAACCCTGCAG
>PDCD4
GGTATTTTCCCTAATTCTCCATGGTGCTTCAATAGCATGTTATTATCATAAAAATGAACAGTTTTGTGGAATAGATGACCAAAT
>PDCD4
ATCCTGATAACTTAAGTGACTCTCTCTTTTCCGGTGATGAAGAAAATGCTGGGACTGAGGAAATAAAGAATGAAATAAATGGAAATTGGATTTCAGCATCCTCCATTAACGAAGCTAGAATTAATGCCAAGGCAAAAAGGCGACTAAGGAAAAACTCATCCCGGGACTCTGGCAGAGGCGATTCGGTCAGCGACAGTGGGAGTGACGCCCTTAGAAGTGGATTAACTGTGCCAACCAGTCCAAAGGGAAGGTTGCTGGATAGGCGATCCAGATCTGGGAAAGGAAGGGGACTACCAAAGAAAG
>PDCD4
GTGGTGCAGGAGGCAAAGGTGTCTGGGGTACACCTGGACAGGTGTATGATGTGGAGGAGGTGGATGTGAAAGATCCTAACTATGATGATGACCAG
>PDCD4
GAGAACTGTGTTTATGAAACTGTAGTTTTGCCTTTGGATGAAAGGGCATTTGAGAAGACTTTAACACCAATCATACAGGAATATTTTGAGCATGGAGATACTAATGAAGTTGCG
>PDCD4
GAAATGTTAAGAGATTTAAATCTTGGTGAAATGAAAAGTGGAGTACCAGTGTTGGCAGTATCCTTAGCATTGGAGGGGAAGGCTAGTCATAGAGAGATGACATCTAAGCTTCTTTCTGACCTTTGTGGGACAGTAATGAGCACAACTGATGTGGAAAAATCATTTGATAAATTGTTGAAAGATCTACCTGAATTAGCACTGGATACTCCTAGAGCACCACAG
>PDCD4
TTGGTGGGCCAGTTTATTGCTAGAGCTGTTGGAGATGGAATTTTATGTAATACCTATATTGATAGTTACAAAGGAACTGTAGATTGTGTGCAGGCTAG
>PDCD4
AGCTGCTCTGGATAAGGCTACCGTGCTTCTGAGTATGTCTAAAGGTGGAAAGCGTAAAGATAGTGTGTGGGGCTCTGGAGGTGGGCAGCAATCTGTCAATCACCTTGTTAAAGAG
>PDCD4
ATTGATATGCTGCTGAAAGAATATTTACTCTCTGGAGACATATCTGAAGCTGAACATTGCCTTAAGGAACTGGAAGTACCTCATTTTCACCATGAGCTTGTATATGAA
>PDCD4
GCTATTATAATGGTTTTAGAGTCAACTGGAGAAAGTACATTTAAGATGATTTTGGATTTATTAAAGTCCCTTTGGAAGTCTTCTACCATTACTGTAGACCAAATGAAAAGA
>PDCD4
GGTTATGAGAGAATTTACAATGAAATTCCGGACATTAATCTGGATGTCCCACATTCATACTCTGTGCTGGAGCGGTTTGTAGAAGAATGTTTTCAGGCTGGAATAATTTCCAAACAACTCAGAGATCTTTGTCCTTCAAG
>PDCD4
This will require a bit more than simple awk scripting I'm afraid. I'm also kinda curious why you want to do this? What's the goal of it?
i want to do this for divergent primer design , the goal is get the mature sequence from any transcript by joining them and then take 100 bp from the 5 prime end and join them 100 bp at the 3 prime end..can you help me how to do that?
How do you want to see them joined? 5AAAAABBBBB3 => 5BB35AA3 (first circularize and then cut) or => 5AA35BB3 (simply cut our middle part)
though I still don't get why you need to do this? why not just take the first 100bp and the last 100bp ? why you want to join them?
i want to get junction sequence for divergent primer design for circular rna detection
So it would be full mature sequence then, 100 bp from the 5 prime end join them to the 3 prime end ,that would be my junction sequence which i can give as input to any primer design tool
ah, it's for circular rna detection, that was my guess as well.
anyway ... you want output in fasta format?