I Have a fasta database containing 16 RNA. I Would like to extract sequences between 2 primers. Like a in sillico PCR but for all 16S RNA in my database. Which software, i m sure there is one ?
I Have a fasta database containing 16 RNA. I Would like to extract sequences between 2 primers. Like a in sillico PCR but for all 16S RNA in my database. Which software, i m sure there is one ?
The following method works pretty well :
cutadapt --discard-untrimmed -g $FORWARD $INPUT 2> /dev/null | cutadapt --discard-untrimmed -a $REVERSE - 2> /dev/null > $OUTPUT
FORWARD and REVERSE are sequence primers .
To quote myself:
I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.
So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences (PacBio amplicons).
These are both in the BBMap package and tolerant of indels, as required due to PacBio's error profile. If you want to only use exact matches, you might need a different approach.
Computationally, it's much less efficient than BBDuk, for a small edit distance. But it works regardless of the orientation of the sequence. With BBDuk you'd essentially need to do a left-trim with one adapter, then a right-trim with the other adapter... which would only work for the sequences oriented as you expect. I'm not sure if 16S repositories all have the same orientation.
Dear Brian I received some paired end reads containing 9 nucleotides in 5’ of Read1 as stater and 6 nucleotides in 3’ side of read2 as stopper random primers. So I need to remove 9 nucleotides from 5’ of Read1 and 6 nucleotides from 3’ side of read2. How can I remove these random primers by using BBMap? Is there any command for this kind of primer trimming in BBMap packages?
I found this solution very helpfull to extract multiple amplicon sequences from 1 input fasta file:
The format of files is explained very nicely here: https://github.com/egonozer/in_silico_pcr
perl extractAmpliconSeq.pl -s $INPUT.fasta -p $primers.txt >log 2> $output.fasta
UCSC in-silico PCR with both standlane and online version
searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I would love to have something like that :
in C++ of course :D
Why not use bbduk.sh from BBMap? You can provide the sequence as literal=left_primer,right_primer. Not C++ but probably of of the best java program there is.
grep -B 1 ACTGAGA.*TCGAGAGA database.fasta > extract.fasta
? (assuming you have linearized the fasta "database")