Hey all,
I'm doing a project involving scaffolding with paired end reads, and I want to find the distances between two contigs connected by a paired end.
I'm fairly new to bioinformatics, but I'm under the impression that you can infer a minimum distance by the measuring the distances from each read to the ends of the contigs that they lie in.
My problem lies with extracting this information from an AMOS (afg) file. It seems this information lies in the 'offset' field, but I'm unsure of what this number actually represents. I've browsed around on the web, but haven't found any resource that helps with this problem specifically.
Any help would be greatly appreciated.
I am not certain, but it might be that there are unaligned portion of the read. So say the first 3 bases of the read are not in the contig, then it's -3. What does the clr say in that case?
If the offset is the location to the beginning of the read, could help me understand why some of the offsets are negative?
One example: off: -72 clr: 100,0
hmm.. try to verify the meaning of the offset by extracting the read sequence and compare to the contig consensus. the documentation is not very clear on what it means.