Removing vector sequences from pacbio BAM files
0
0
Entering edit mode
6.2 years ago
bnorris823 • 0

Hello,

I have a PacBio BAM files with 10kb+ long reads and a vector sequence that is about 8kb. I want to remove any part of the reads which are a certain percent matching the vector sequence.

I have looked into BBduk, but can't seem to find a way to remove any matching sequences, only filter out reads with matching sequences.

Please let me know if there is a different approach that I should be taking.

Thanks.

sequence Pac Bio BBduk • 1.8k views
ADD COMMENT
0
Entering edit mode

You didn't tell us what you want to do afterwards. If it's reference genome alignment I would just include the vector sequence as a chromosome and let the aligner sort it out.

ADD REPLY
0
Entering edit mode

way to remove any matching sequences, only filter out reads with matching sequences.

What does that mean?

You may able to use bbsplit.sh with the vector (and reference) sequence to bin reads containing the vector.

ADD REPLY
0
Entering edit mode

related side-note question :

is bbduk suitable for pacbio data anyway?

ADD REPLY
0
Entering edit mode

I want to trim out the vector sequence from the middle of the read.

ADD REPLY
1
Entering edit mode

I am not sure if any of standard trimming programs are setup to do this since most are meant for small reads and expect the adapter to be on one end (or other) of the read. Your best bet may be to filter the reads containing the vector, separate them and then deal with them separately.

ADD REPLY
0
Entering edit mode

would bbduk with ktrim=r and providing the vector as the adapter file not get the desired behavior?

ah, and you'll need to first convert the BAM file to fastq file

ADD REPLY
0
Entering edit mode

wouldn't that remove everything to the right of where the vector is found? What if the vector is in the middle of a read?

ADD REPLY
1
Entering edit mode

yes, indeed. But I thought that was the goal, my bad.

as pointed out by genomax I don't think there is an off-the-shelf tool that will do that for you.

I'm also a bit puzzled why you want to do that, or rather how you end up with that kind of situation in your pacbio reads? Can the vector also be on the extremities or do you suspect it to always be in the middle?

ADD REPLY
0
Entering edit mode

ya it could be anywhere in the reads. I think I'm going to try to trim right and left and then merge the reads back together in a python script.

ADD REPLY
0
Entering edit mode

That would be one way to do this. Filter the reads out using bbsplit.sh.

ADD REPLY

Login before adding your answer.

Traffic: 1649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6