Entering edit mode
11.1 years ago
mathieu.bahin
▴
90
Hi all,
I have a set of mapped paired-end reads and I would like to assemble the ones that overlap respecting the pairing information.
This means assemblying only pairs when the 2 first mates overlap and the 2 second mates overlap too. The reads are already mapped on a genome, there is nothing more to do with the sequences, only with the positions.
The goal is to get the extended positions with the count information.
Pairs example:
chr5:1456-1498,+ chr5:1654-1702,+
chr5:958-1012,+ chr5:1318-1388,+
chr5:1423-1478,+ chr5:1612-1667,+
I would like to get:
2 chr5:1423-1498,+ chr5:1612-1702,+
1 chr5:958-1012,+ chr5:1318-1388,+
I can't find any software working on the positions, all I can find is FLASH, PEAR, etc. which are working on the fastq files.
Cheers
Why not just use cufflinks and then get counts using featureCounts from the resulting GFF file?
Thank you for your answser. I am not sure that I totally understand it. I think that cufflinks would assemble reads independently of the pairing information, which I don't want. I want to process each pair against each pair. Is there an option in cufflinks to only assemble when the 2 mates are overlapping ?
Hmm, true, I guess that wouldn't work for you. You might have to code something with GenomicRanges.
Ok thanks, maybe I'll try that. I have another lead with 'PairtoPair' from bedtools too.