Searching for plasmid in assembly (plasmidSPAdes)
0
0
Entering edit mode
17 months ago
s4r10ct • 0

Hi, i am new to bioinformatics analysis and currently researching whole genome sequence of Acinetobacter baumannii. I used SPAdes to perform assembly of genome and plasmidSPAdes to do the same for plasmid. After that, BLASTn against PLSDB database were performed on plasmid assembly using 95% identity and 90% qcov parameters. The results are already obtained with alignment length varies, however my question is :

  1. Is there any criteria (such as minimum alignment length) for determining plasmid from BLASTn output?
  2. For example, i obtained alignment length of 1068 from BLASTn result. After looking for the subject ID, the plasmid actually has 70kbp length. Is this valid to be reported as plasmid or should i neglect it?
  3. Also, i obtained several result for the same subject ID but in different start and end position. Same as question 2, is this valid to be reported as plasmid or should i neglect it?

Any help or suggestion will be appreciated.

Assembly plasmidSPAdes Plasmid • 1.4k views
ADD COMMENT
0
Entering edit mode

Plasmids are often hard to resolve as they can have repetitive sequences.

Is there a known reference for this plasmid you can compare against? You will likely need to find the longest contigs you have to try and cover the 70kb. You probably wont have any 70kb contigs, so you will need to make the plasmid up of fragments.

I would suggest probably trying to find a global aligner rather than blast. If you are focussing on identity, BLAST will prioritise high identity local alignments which will likely be shorter.

ADD REPLY
0
Entering edit mode

Hi Joe, thank you for your reply. I intend to find what plasmid contained in my plasmidSPAdes contig so i don't know any reference i can use to compare it. Several results come out like this

  • BLASTN 2.12.0+
  • Query: NODE_3_length_8858_cov_1036.798763_component_1
  • Database: /home/s4r10ct/Documents/Brooks.fna
  • Fields: query id, subject id, alignment length, evalue, bit score, q. start, q. end, % query coverage per hsp, % query coverage per subject, % identity, s. start, s. end
  • 2 hits found

    NODE_3_length_8858_cov_1036.798763_component_1 ref|NZ_CP008850.1| 8155 0.0 15060 1 8155 92 92 100.000 583 8737

NODE_3_length_8858_cov_1036.798763_component_1 ref|NZ_CP020576.1| 8097 0.0 14946 762 8858 91 91 99.988 8730 635

In that case, i found that the sequence actually have almost similar length with the database (around 8kbp), so i can be assured that it is actually a plasmid. However, in the case of question of number 2 and 3, the results looks like this :

  • BLASTN 2.12.0+
  • Query: NODE_4_length_1068_cov_1761.753454_component_0
  • Database: /home/s4r10ct/Documents/Brooks.fna
  • Fields: query id, subject id, alignment length, evalue, bit score, q. start, q. end, % query coverage per hsp, % query coverage per subject, % identity, s. start, s. end
  • 53 hits found

NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP021695.1| 1068 0.0 1973 1 1068 100 100 100.000 97949 96882 NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP021695.1| 1068 0.0 1973 1 1068 100 100 100.000 127619 128686 NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP020577.1| 1068 0.0 1973 1 1068 100 100 100.000 61102 60035 NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP020596.1| 1068 0.0 1973 1 1068 100 100 100.000 19840 18773 NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP020596.1| 1068 0.0 1973 1 1068 100 100 100.000 52759 53826 NODE_4_length_1068_cov_1761.753454_component_0 ref|NZ_CP020596.1| 1068 0.0 1973 1 1068 100 100 100.000 162241 163308

and goes on with the same alignment length for all the results on the same contig. This, however doesn't have the same length with the database (around 70kbp). I intend to neglect it since the length differ too much with the database, i'm afraid it was just a plasmid fragment. How about your thought?

ADD REPLY
0
Entering edit mode

You almost certainly wont find anything with an alignment length of 70kb because you're (presumably) using short reads.

There may be better tools out there, but you could try aligning contigs against a reference using Mauve if you have one. Since you know it's 70kb i assume there is some sort of reference sequence available?

ADD REPLY
0
Entering edit mode

Hi Joe. I see, so it's almost impossible to use the short contigs sequence. Regarding reference, does database counted as "reference"? The plasmid database i used is PLSDB. I searched each one of the results against the database, and the database provide me with the plasmid length. Indeed, i have Mauve but never used it on the contigs.

ADD REPLY
0
Entering edit mode

It's not impossible, you just won't have a full contiguous plasmid sequence, you'll have to make it up from various contigs and will probably still have gaps.

By reference, I mean has this plasmid (or a similar one) ever been sequenced before? If it has, you can align and scaffold contigs against that which will be easiest.

ADD REPLY
0
Entering edit mode

Hi Joe. Sorry for the late response. I see, i will try your suggestion ASAP. I will circle back to you if there are any problem arise while doing the analysis. Thank you for your time and suggestion Joe, have a nice weekend.

ADD REPLY

Login before adding your answer.

Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6