Hi there,
If I have a fasta sequence as following:
chr3:181879479-181879497 CGTTCCTCCTGGCGAGAG chr3:181879488-181879506 TACTTATTTCGTTCCTCC chr3:181879507-181879525 GAGGAGTGGGCATGAGGA chr3:181879549-181879567 AACCCTAAATGTCAATTA
How do I extract the sequence starting with "G"
chr3:181879507-181879525 GAGGAGTGGGCATGAGGA
Thanks a lot.
It seems to me, it's better to deal with fasta-sequences, so add ">"-sign before 'chr'.
Start from the first ">" and read every sign until the next ">". Gaps play the role of a new line sign, don't they?
Make and open a new empty file. Write everything that has been read to this new file.
Check the first letter after the gap or spacer, " ". If this was 'G', save the file with "good"-current output, then continue.
If this was not 'G', don't save the file with the latest output.
Check this already answered thread: extract sequences from fasta starting with a specific nucleotide