Entering edit mode
4.9 years ago
manekineko
▴
150
Is there a tool I can use to input a BAM and do a collapsing based on sequence and positions and retaining the copy number of the unique sequence? (outputing again BAM)
Not clear. input/output example required.
Is this not what mark duplicates from picard does?
http://seqanswers.com/forums/showthread.php?t=7776
Samtools + Picard Markduplicates
Picard Mark Duplicates
https://broadinstitute.github.io/picard/
There are many more post about this but then you may know what to look for and if this is an option.
I need something like FASTA collapsing, where you retain only 1 unique sequence with its copy number, but on a level of BAM file. I have BAM mapped with uncollapsed sequences, and need to collapsed it somehow to have BAM with unique sequences mapped somewhere and its copy number (retained in the name or similar way).
For example, if containing identical seq mapping the same pos:
to retain only 1 indicating on the name (or similar way) _2 the copy number:
the reads above are unmapped.
I just made an example some flags may be wrong, view it as mapped. I hope you got what I mean and want to do?
This sounds much like the ReducedReads format from early GATK versions. Ultimately it was retired because it wasn't sufficient to capture all the important information, but it may still be available if you can find an old enough GATK (2.8?).