Entering edit mode
7.8 years ago
cool.abbecker
▴
30
What kind of software tool would you recommend as best suited to detect psbA-trnH inverisons? I am using R usually, however I cannot find any suitable packages. My sequences are in multiple .fasta files and I have around 30000 sequence in total.
Here is an example: http://journals.plos.org/plosone/article/figure/image?size=large&id=10.1371/journal.pone.0011533.g001
These guys developed a pipeline: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498263/ However, thier web page is down.
I do want to identify the inversion and turn it around. I do not want to mask or delete it.
Thanks!
You should be more precise with regard to the technology you used, "fasta files" isn't very specific. The answer to your question would probably be very different if those files are derived from Solid, Illumina, Sanger or Oxford Nanopore sequencing data.
I have a .fasta file containing the sequences of Sanger single read, one direction sequencing only, 1x coverage
I have no access to sequencing data. The inversions are anywhere between 5bp and ? bp. So wham is too rough (they say they can only detect >50bp). Here is another example: https://www.researchgate.net/profile/Wojciech_Bieniek/publication/271952841/figure/fig1/AS:331540467863561@1456056815704/Fig-1-Multiple-alignment-of-the-highly-variable-part-of-the-trnH-psbA-region-from-the.png (dotted line) To make it even more difficult the inversion is not at the same place in all sequences. There are groups of sequences that have the same inversion, but not all have the same one.
So this boils down to comparing a fasta file with the reference sequence and detect differences?
When you write psbA inversion I assumed you mean a whole gene inversion, but I'm not really aware of plant genetics.
Technically psbA-trnH is an intergenetic spacer. I have no reference sequence, I compare everything to everything and try to identify the inversion groups within the psbA sequences. I don't know where and how long the inversions are. Please take a look at the first example (its a picture) to understand what I try to explain. To make it even worse, some sequences have a deletion within the inversion region.