Recommendation to blast short <1kb sequences contained in a fasta file against a fastq file (obtained from Oxford Nanopore sequence)?
1
0
Entering edit mode
5.6 years ago

Hi everyone,

I'm a novice in the world of bioinformatics and I was wondering if anyone could help me blast a short series of sequences contained in a fasta file, against a fastq file obtained through Oxford Nanapore MinION sequencing? My goal is to see if the fastq files contains the gene(s) I'm looking for.

Thanks!

sequence alignment sequencing • 2.0k views
ADD COMMENT
0
Entering edit mode

You should ideally use minimap2 (https://github.com/lh3/minimap2 ). How many sequences are in respective files? How long are your fasta sequences?

ADD REPLY
0
Entering edit mode

As your title suggests, you could do this with BLAST, but as Genomax points out, minimap2 might be better, depending on the data.

ADD REPLY
0
Entering edit mode

There are 35 sequences in the fasta file (all between 100bp and 300bp). The fastQ file(s) I would like to try and align the sequences contained in the fasta file against contains raw Oxford Nanapore fastq sequence at around 1Mb. Thanks for the minimap2 recommend.

ADD REPLY
0
Entering edit mode

Do you need to do this from reads? It may be sensible to assemble your nanopore data first, just on the off chance that one of your hits is at the end of a read or something. It would also reduce the dimensions of the output to just one or two hits, rather than several tens or hundreds of reads too.

ADD REPLY
0
Entering edit mode
5.6 years ago
flogin ▴ 280

I have no idea if any program accept a fastq to make database (blast, DIAMOND, Bowtie2...).

If I'm not mistaken, you can convert you fastq of Nanopore output to fasta format, and use it to make your blast analysis.

You can easily make your analysis...

$ makeblastdb -in nanopore.fasta -dbtype nucl
$ blastn -db nanopore.fasta -query sequences.fasta -out sequences.fasta.blastn -outfmt 6 -evalue 0.00001 -word_size 7

The -outfmt 6 format output to a spreadsheet with several informations (identity, cover, gaps, regions if alignments, e.g.) and the word_size 7 make your analysis more sensitive.

I hope that it helps you.

ADD COMMENT

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6