Extract gene related reads from a whole run
2
0
Entering edit mode
2.8 years ago
Jacopo • 0

I want to extract, from reads of a whole run (ERR949847), only katG related reads. I downloaded, from gene dataset of ncbi, katG related sequence. Here the header of the of the fasta file:

>NC_000962.3:c2156111-2153889 katG [organism=Mycobacterium tuberculosis H37Rv] [GeneID=885638] [chromosome=]

For complete information, the reads of the run which i mentioned before, belong to the same organism: Mycrobacterium tuberculosis H37Rv. In order to achieve my task i follow few steps that seems reasonable to me:

  1. Send the run to blast, through ncbi site.
  2. Copy the gene related sequence into "query sequence" input field
  3. Start the allignment.

This is the output that i got:

 >gnl|SRA|ERR949847.2068134.2 2068134
TGTCCCAGGCAGCGACGAAGTCCTGCACGAACTTCGGCTGCGCGTCATCGGCGCCATAGACCTCGACAAG
CGCCC
>gnl|SRA|ERR949847.2068134.1 2068134
TGGCAAGGTGAAGTGGACCGGCAGCCGCGTGGACCTGGTCTTCGGGTCCAACTCGGAGTTGCGGGCGCTT
GTCGA
>gnl|SRA|ERR949847.1529974.2 1529974
CTTGTACCAGGCCTTGGCGAACTCGTCGGCCAATTCCTCGGGGTGTTCCAGCCAGCGACGCGTGATCCGC
TCATA
>gnl|SRA|ERR949847.1529974.1 1529974
GGCCACTGACCTCTCGCTGCGGGTGGATCCGATCTATGAGCGGATCACGCGTCGCTGGCTGGAACACCCC
GAGGA
>gnl|SRA|ERR949847.1388549.2 1388549
TGTCCCAGGCAGCGACGAAGTCCTGCACGAACTTCGGCTGCGCGTCATCGGCGCCATAGACCTCGACAAG
CGCCC
>gnl|SRA|ERR949847.1388549.1 1388549
TGGCAAGGTGAAGTGGACCGGCAGCCGCGTGGACCTGGTCTTCGGGTCCAACTCGGAGTTGCGGGCGCTT
GTCGA
>gnl|SRA|ERR949847.1227510.2 1227510
CTTGTACCAGGCCTTGGCGAACTCGTCGGCCAATTCCTCGGGGTGTTCCAGCCAGCGACGCGTGATCCGC
TCATA
>gnl|SRA|ERR949847.1227510.1 1227510
GGCCACTGACCTCTCGCTGCGGGTGGATCCGATCTATGAGCGGATCACGCGTCGCTGGCTGGAACACCCC
GAGGA
>gnl|SRA|ERR949847.1100314.2 1100314
GACGCATCCGTGCGGCCCGGGGTGAAGGGCACCGTGATGTTGTGGCCAGCCGCCTTTGCTGCTTTCTCTA
TGGCG
>gnl|SRA|ERR949847.1100314.1 1100314
CAAGGTCATTCGCACCCTGGAAGAGATCCAGGAGTCATTCAACTCCGCGGCGCCGGGGAACATCAAAGTG
TCCTT
>gnl|SRA|ERR949847.316224.2 316224
TGCATGCCGCCCCCGGCGCCGCCGCGGCCGTCGTGGATGCGGTAGGTGCCGGCAGCGTGCCACGCCATCC
GGATA
>gnl|SRA|ERR949847.316224.1 316224
GACCATCGACGTTGACGCCCTGACGCGGGACATCGAGGAAGTGATGACCACCTCGCAGCCGTGGTGGCCC
GCCGA
>gnl|SRA|ERR949847.301240.1 301240
GGCCACTGACCTCTCGCTGCGGGTGGATCCGATCTATGAGCGGATCACGCGTCGCTGGCTGGAACACCCC
GAGGA
>gnl|SRA|ERR949847.301240.2 301240
CTTGTACCAGGCCTTGGCGAACTCGTCGGCCAATTCCTCGGGGAGCTCCAGCCAGCGACGCGTGATCCGC
TCATA

Now i have some question:

  • Does this make sense?
  • If this make sense, there is a better way to get the same?

    P.S.: I'm sorry if i made mistakes or if i didn't use the correct convention but i'm actually a newbie!

Blast ncbi SRA DNA Fasta • 1.1k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
GenoMax 147k

This is reasonable. Just be aware that since you are using blast to do this search (which uses local alignments) you may end up getting some reads that may have partial matches (a possibility).

What do you hope to do next? Do an assembly of these reads?

ADD COMMENT
0
Entering edit mode

Ok! I didn't know that. Thank you for the explanation, i have to deal with. No, not an assembly, not directly at least. I want to do a drug susceptibility test with a software called Mykrobe, and my purpose is to build a dataset to use as input.

ADD REPLY
0
Entering edit mode
2.8 years ago

We have specialized software that is far, far faster at aligning millions of short reads to a sequence than BLAST. I'm not sure how happy NCBI is with people submitting millions of sequences at a time either...are you sure that your whole file got processed?

ADD COMMENT
0
Entering edit mode

NCBI allows people to use blast search against SRA accessions via blast web page (you can't do -remote searches against SRA though). In this case OP is using a single sequence to search against that SRA accession. That should be the whole accession included in search.

ADD REPLY
0
Entering edit mode

I'm not sure but i think so, given the fact that in some hard computational tests that i made the sites doesn't give me response.

ADD REPLY

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6