Hi Everyone,
I have a list of 1kb windows in hg19 coordinates for which I need to get the chimp sequence. Here is what I did:
- Downloaded Chimp fasta file from UCSC
- Lifted over bedfiles with the positions of the windows from Hg19 to PanTro4 using UCSC's liftOver tool
- Used bedtools to extract segments from PanTro4 fasta file
There were two problems:
- Some of the windows could not be mapped to PanTro4 (~7%)
- Many of the windows changed size during the liftOver from Hg19 to PanTro4 (~60%)
These windows are already filtered for problematic sites (hypermutable sites and sites of uncertain orthology with chimpanzee), so I did not expect such a high ratio of windows to be problematic during the liftover step. Does anyone have an idea of what is happening here?
Many thanks in advance for any advice!
Perhaps I misunderstood, but you used both hg18 and hg19?
@sskvera: Did you forget to include
hg18 --> hg19
liftover in the description above (which you must have done)?Sorry, should've been hg19 (corrected post).