Manta and alignment name collision
1
0
Entering edit mode
21 months ago

Dear community members,

I received hundreds of CRAM files which I have to run through Manta SV calling and they fail due to "Unexpected alignment name collision" - this file contains tens (out of millions) of reads which were multi-mapped, so they have 2 entries for some reads with particular IDs.

I don't care about those tens of multi-mappers, I want to keep only one alignment out of two (the random one) - what's the cheapest way to do it? Previously for WES samples I was converting bam files to fastq, realigning and removing reads with same IDs, but now I have WGS and I certainly don't want to re-map them.

If anyone has a ready-to-use script - please, share, otherwise I'll have to do something in python/samtools view myself...

UPD: Temporary solution - I do samtools view, pipe into 5-liner in Python and then pipe back to samtools view and this is the weirdest routine I've ever done in bioinformatics

SV • 886 views
ADD COMMENT
1
Entering edit mode
21 months ago

I wrote https://jvarkit.readthedocs.io/en/latest/SamRemoveDuplicatedNames/ to remove this kind of reads, but I'm not sure it's what you're looking for.

another idea: change manta to skip those reads. for example, replace https://github.com/Illumina/manta/blob/75b5c38d4fcd2f6961197b28a41eb61856f2d976/src/c%2B%2B/lib/manta/SVCandidateSetData.cpp#L125 with 'return;` instead of throwing an exception (not tested !!)

ADD COMMENT
0
Entering edit mode

Thanks a lot! Will try both!

ADD REPLY

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6