Mapping fasta/fastq reads against a long fasta sequence using grep
0
0
Entering edit mode
4.5 years ago
Seq225 ▴ 110

I have read file (~25Nt; fasta or fastq formate). I want to extract the reads that map against a long gene sequences (2Kb, fasta formate). I want to use grep. I can use bowtie or other tools. But, I want to use grep. Anybody can help me with the command, please? I want want to allow one or two mismatches per read.

Thanks!!

sequence R alignment assembly genome • 2.2k views
ADD COMMENT
1
Entering edit mode

What have you tried so far then? Use reads to search since your reference is longer.

ADD REPLY
0
Entering edit mode

I actually have not tried anything yet. My target fasta is a long sequence ( I actually have several sequences of different legths, I want to search them separately). The read file has thousands of reads (or tens of thousands) Thanks!

ADD REPLY
3
Entering edit mode

Then you should consider using a proper aligner. bbmap.sh can use both fasta and fastq reads to do the search.

ADD REPLY
0
Entering edit mode

Thanks. I will try bbmap.sh.

ADD REPLY
1
Entering edit mode

If your target sequence is a Fasta file, you will have new-lines every 80-100 nt, that will break your grep regex, also allowing mismatches will be not good in grep (where do you put them?)

You can use the fasta aligner to properly search without indexing

ADD REPLY
0
Entering edit mode

Thanks. If there is no line break in the fasta sequence? Suppose it is a long line. I can also split it into ~80 Nt and then want to find the reads that match.

ADD REPLY
1
Entering edit mode

You can linearize the fasta easily : Linearize fasta files

ADD REPLY
1
Entering edit mode

even with the sequence in a single line, you will have the problem of the mismatches, grep is great when you want an exact hit, but if you want to allow 1-2 mismatches that becomes complex, for example, to search ACGT with up to2 mismatches, you need to search:

ACGT
.CGT
A.CGT
AC.T
ACG.
..GT
A..GT
A.C.
A..T
.C.T
AC..
ADD REPLY
0
Entering edit mode

Thanks. I think then my option would be to do only perfect matches.

ADD REPLY

Login before adding your answer.

Traffic: 2135 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6