Entering edit mode
8.0 years ago
Zev.Kronenberg
12k
Greetings,
I'm aligning large contigs to a reference genome. I'm hitting upper limit on the number of CIGAR operations (error below). Does anyone have a tool that will cut alignments intelligently? You'd have to add hard clipping to the sequence for the query position to be correct.
[E::sam_parse1] too many CIGAR operations
[W::sam_read1] parse error at line 201
as far as I know , the java htslib uses a 'int' rather than a 'C' unsigned short. It could be easy to code, but I don't really understand your needs.
Pierre Lindenbaum basically I just need to parse the CIGARS to output deletions, but there are over 65K cigar operations. HTSLIB chokes. So option 1. Use another codebase. option 2: cut up alignments in the sam file.
Everything meant for BAM files will choke, since you can't make a BAM out of that. I fear you'll need to code a custom parser.
Devon Ryan Yup, This is going to be more and more of a common occurrence with assemblies being aligned to genomes.
Yeah, at first I was wondering what sort of crazy stuff you were doing today, but then I thought about long pacbio reads making really really long contigs and...I think you raise a good point.
It's been said that premature optimization is the root of all evil :)