Entering edit mode
6.2 years ago
bnorris823
•
0
Hello,
I have a PacBio BAM files with 10kb+ long reads and a vector sequence that is about 8kb. I want to remove any part of the reads which are a certain percent matching the vector sequence.
I have looked into BBduk, but can't seem to find a way to remove any matching sequences, only filter out reads with matching sequences.
Please let me know if there is a different approach that I should be taking.
Thanks.
You didn't tell us what you want to do afterwards. If it's reference genome alignment I would just include the vector sequence as a chromosome and let the aligner sort it out.
What does that mean?
You may able to use
bbsplit.sh
with the vector (and reference) sequence to bin reads containing the vector.related side-note question :
is bbduk suitable for pacbio data anyway?
I want to trim out the vector sequence from the middle of the read.
I am not sure if any of standard trimming programs are setup to do this since most are meant for small reads and expect the adapter to be on one end (or other) of the read. Your best bet may be to filter the reads containing the vector, separate them and then deal with them separately.
would bbduk with
ktrim=r
and providing the vector as the adapter file not get the desired behavior?ah, and you'll need to first convert the BAM file to fastq file
wouldn't that remove everything to the right of where the vector is found? What if the vector is in the middle of a read?
yes, indeed. But I thought that was the goal, my bad.
as pointed out by genomax I don't think there is an off-the-shelf tool that will do that for you.
I'm also a bit puzzled why you want to do that, or rather how you end up with that kind of situation in your pacbio reads? Can the vector also be on the extremities or do you suspect it to always be in the middle?
ya it could be anywhere in the reads. I think I'm going to try to trim right and left and then merge the reads back together in a python script.
That would be one way to do this. Filter the reads out using
bbsplit.sh
.