Question

Percentage Problem

1

Entering edit mode

4 months ago

parb2182 ▴ 10

Hello,

There was a nanopore sequence of a chimeric virus which was made by 2 different but similar viruses given to me (as a .fasta file) and I am tasked with finding out which of the reads align with virus 1 and which one virus 2.

I trimmed and cleaned the data and also annotated them using a file that contains both of the unique and common genes found between the virus 1 and 2 and I also made that a fasta file.

For finding out the percentage, I tried to use distance scoring and wrote a script myself but it would take too long. I tried looking for a tool that does this for me. Does anybody have any tools or ideas on what can i do to finally get the results i want?

Thank you very much for you help. I really do appreciate it :)

gene alignment • 550 views

ADD COMMENT • link updated 4 months ago by jared.andrews07 ★ 18k • written 4 months ago by parb2182 ▴ 10

score 1 · Answer 1 · 2024-07-17

1

Entering edit mode

4 months ago

jared.andrews07 ★ 18k

Sounds like Seal or bbsplit from bbmap will probably get you what you want.

ADD COMMENT • link 4 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

First of all, thank you very much for you recommendation Dr. Andrews. That is a very cool tool. But i feel like I should've been more specific.

I just had an additional question. I've aligned both of the genes using minimap2 and I went through the normal aligning procedure.

I have to take these 2 files and compare them with each other. For example, out of 300k reads that I have, maybe the first match 100% with both Virus1 and Virus2. I want to ignore this read and have an ignored_counter that is +=1.

But if one's score is higher than the other one, for example, out of 500 nucleotides, 400 matched with virus 1 and 490 matched with virus2, I want to have the gene name which I have annotated, % or the score on the matching, and the sequence in a text. so in the 2nd example it would be:

Virus2 490/500 (or any other score like distancing score) ATCC...GCAAC

or something along these lines. Is it possible to do this analysis with this build or is there any other tool(s) available for this or do I have to write a script that does all of these?

ADD REPLY • link 4 months ago by parb2182 ▴ 10

0

Entering edit mode

If you need to check on individual base results then perhaps using blast may be an option. That should also give you control over gap penalties etc and parsing the output may be easier with outfmt 6. magicblast if you have lots of shot input queries.

ADD REPLY • link 4 months ago by GenoMax 148k

0

Entering edit mode

bbsplit will give you counts/percentages of ambiguous reads, which you can then handle however you'd like (see the -ambiguous and -ambiguous2 parameters for bbsplit or -ambig for seal). You can toss those reads, assign them to one or both reference, whatever. It'll spit out stats regardless of what you choose to do with the reads.

It sounds like it may be worth splitting the reads by virus first, then dealing with the gene annotations after the fact if you want counts or whatever for them.

ADD REPLY • link 4 months ago by jared.andrews07 ★ 18k