Aligning short sequences to fastq
1
2
Entering edit mode
7.2 years ago
BPors ▴ 60

Hi,

I am trying to search for the presence of couple sequences (around 400) each with a size of 23 bps,in different fastq files, while allowing 1-2 mismatches at maximum. I am not sure if turning the fastq to a genome(transcriptome) would be a nice approach? I have tried making the fastq -> fasta -> building blast database -> running blastn, however it did not run as my query is not only one sequence.

Example part of my query.file :

ATTTTTCTGAAAAACCCCCTACGA

AACAGGAAGTCAAAAAAAGCCAA

AGGATTTTTTTTTTTCTGGGGACA

The output I am aiming to have is, for each read in my query.file, which of these sequences are having 100% (or having 1-2 mismatches) match in fastq file, and possibly where in the fastq file.

I would appreciate your suggestions! Thank you!

RNA-Seq short sequences aligning • 3.8k views
ADD COMMENT
1
Entering edit mode

You could use bowtie instead of blast. Make a fasta from the fastq, build a bowtie index from it, then align the query. Bowtie has an option that controls how many mismatches are allowed in the seed (-n). As the seed (28bp) is longer than your queries, setting the max seed mismatches to 1 or 2 should be sufficient for your goal.

ADD REPLY
0
Entering edit mode

Thank you for your answer. I would like to try, but I have these reads in just text format, therefore I cannot turn it to fastq. I think in Bowtie I have use reads in fastq format

ADD REPLY
1
Entering edit mode

No, several formats are accepted:

-q query input files are FASTQ .fq/.fastq (default) |||| -f query input files are (multi-)FASTA .fa/.mfa |||| -r query input files are raw one-sequence-per-line

ADD REPLY
0
Entering edit mode

Thank you! I have eventually used BBDUK but I will give bowtie a try soon with these options. ( -r).

ADD REPLY
0
Entering edit mode

I was not aware of that these is a function in BB. This BB stuff is really a jack-of-all-trades.

ADD REPLY
0
Entering edit mode

Hi,

May be you can try to ta align with bwa aln your 23 bps seq against your fastq files as ref after you transformed it as fasta ?

Best

ADD REPLY
0
Entering edit mode

Thank you for your suggestion. Would this work if my reads are in text format?

ADD REPLY
4
Entering edit mode
7.2 years ago

You can grab the fastq sequences containing these 23-mers with BBDuk like this:

bbduk.sh in=file.fastq outm=matched.fastq ref=23mers.fa k=23 hdist=2

"hdist=2" allows 2 mismatches; you can alteratively set that to 1 or 0. This does not tell you where the match is, but you can do that like this:

bbduk.sh in=matched.fastq out=masked.fastq ref=23mers.fa k=23 hdist=2 kmask=lc

That will convert the matched regions to lowercase.

ADD COMMENT
0
Entering edit mode

Thank you! That worked well for me!

ADD REPLY

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6