Entering edit mode
18 months ago
Daniel
▴
30
Hello,
I have a general question: Is it possible that the multiz 30-way alignment is inaccurate? I am looking for specific hg38 sequences in the blocks (I am using BLAT retrieved coordinates, with PHAST to parse the maf blocks), and the sequences I am getting do not match the sequences I see from ensembl/ their coordinates. Trying to figure out why this may be, and how to deal with this issue.
My files say ##maf version=1 scoring=roast.v3.3
Not sure if this is relevant: How should reference genome fasta files be distributed by UCSC?
Thank you! If I understand correctly, it's possible that the coordinates/sequences may not be the same anymore, because the maf files use the original release/older-patch, while newer tools are up-to-date to the latest patch update?
Can you provide an example? Which MAF file are you referring to, on the UCSC and Ensembl side. I can imagine a dozen different things that could be going on here. Also, I don't understand "I am using BLAT retrieved coordinates". Not sure how PHAST can parse the maf blocks, that's just a text file format, does PHAST provide any additional value here?
I am referring to chr2.maf which I found here (last updated 2017): https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz30way/maf/
PHAST has a tool called maf_parse, which retrieves the blocks in a maf file corresponding to your region of interest. I was interested in a specific sequence, and I got its coordinates from BLAT. Then, when I tried looking at those coordinates in the file, they were not there.
However, I realized this is because of my error: The BLAT coordinates are on the negative strand, while the maf file is on the positive. Trying to figure out how to "flip" the entire file...