Entering edit mode
3.9 years ago
harry
▴
40
As I have multiple fasta sequences like see below:
>read1
CAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCGCCG
>read2
CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT
>read3
TTGAGGTGGGAGGATCGCTTCAGCCTGGAAGGTTGAGGCTGCAGTCAGCTGCGATAGCACTACTACACTCCAGCCTTGGACAACAGAGGGAGACCTTTCGCTGTCACCCCTCTAGAATCCACGTATACGAAAATTCCAAATGTTAGTTGGGCATAGTGGCAAGCACCTGTAGTCTCAGCCACGTGGGAGG
These are in different lengths so I want to isolate the middle sequence of all the fasta_sequence_reads. It is better if all are 150-160bp in sequences. Is there is a way to do this? Thanks in advance. for example, I have 1 read like below which contain 247 nucleotides: read2:
CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT
so after trimming from both sides the middle part is remaining 153bp:
TTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATT
Like this, I want to do for my whole fasta sequence by using one command. So please can you tell me how to get a middle sequence of fasta files. thanks in advance.
how is it different from your previous question ? Script for making exon file
In previous questions, I asked about how to cut a sequence and rejoin. means cut in 2 equal parts and join the downstream region to the upstream region. in this question I want to isolate the sequence from the middle part which overhangs from both sides of the middle part. Thanks
it's the very same kind of operation.
Can you please tell me how is it I doing this because I want a 150 bp sequence from the middle? 75bp upstream and 75bp downstream from the middle. Thanks
https://thomas-cokelaer.info/blog/2011/05/awk-the-substr-command-to-select-a-substring/
This is essentially alignment with trimming. You can do this with a GUI application like ugene, or geneious or bioedit. You can also see this answer: Most efficient way to trim overhanging bases after alignment
I disagree! Sometimes it's easier to just use a gui and get the job done. This is a good example where such usage is perfect. I do this regularly for trimming large MSA aligned to a reference.
The problem with GUI applications is that they are often not easily reproducible and scalable in workflows. It's better in the long term to use CLI unless absolutely necessary.