hello every one,
I know it is quit irritating for those how have good experience but defiantly helpful for newbie.
I am looking for better method for capturing chimeric reads from millions of illumina reads, for that I read few papers but did not find conclusive. but what I got is : 1st step conversion of SRA file to fastq/a file
2nd step is map fastq/a file to the reference genome[which is backbone of further study], there are no. of freely available tools like bowtie1 bowtie2, novoalign etc BUT WHICH ONE IS BETTER FOR HI-C DATA still i don't know and how many mapping parameters are suitable to capture more chimeric reads from million of reads [seeking for experts comments or suggestions]
3rd step is filtering those chimeric reads which mapped properly on reference genome, here properly mapped means, if we have a chimeric read which map on chromosome A at two different positions like first portion of read(which could be length of read - N, where N could be any integer) mapped at 5000 to 5030 and remaining portion of read mapped at 10000-10068 position of chromosome A. how to extract such type of mapping from output .sam file [syntax which extract such type of information]
4th step to visualization of mapping, there are number of tools for visualization of mapping data.
your valuable comments are always welcome.
Probably, chimeric reads is not really a problem with bowtie or bwa. I would check if these programs have the power to detect fractional matches at all (e.g. 50%). Those reads would possibly remain unmapped. To be sure you could simply filter reads with more than one match (given the tool will report them) and those which are less than 90% covered by their aligned region.
Dear Michael, Thank you for your comments. may you tell me syntax which would be helpful to detect chimeric reads??? Is there any difference between mapping of chimeric reads and fractional match[may be mapping of a read which align at two different positions]
I'd try to align the fraction of unmapped sequences with e.g.
blastn -task blastn-short
and screen the output for multiple partial matches.You may be interested in Leonid Mirny's hiclib