Hi,
I have a question related to the steps following a liftOver.
Since there is no annotation available for rhesus macaque genome for the last assemblies, I did a lift over of rheMac2 (the old assembly) annotations to the recent rheMac8 assembly (using STAR). I obtained two output files, one with all the features that have been lifted over, with the new coordinates, and one with the "unmapped" features.
But I didn't find any clear explanation of what is the next step (maybe it's trivial...). Can I directly use this output file as my annotation file to index my rheMac3 genome and do the alignment? (I'm using STAR for this)
Second question: is there a statistic I can look at to say how well my lift over worked? I looked at how many features ended up in the liftover file vs unmapped file, and 97% of the features present in the original annotation have been lifted over. Is that enough to say I can use this reliably?
I didn't find a source clearly explaining this, but if you know one, any link is welcome :)
Thank you, Camille
I figured out that I don't really need to use the rheMac8 in the end. (I precise I am very new at bioinformatics) I though that using the most recent one would always be better, since the assembly is of better quality then, but rheMac2 is fine is my case.
So my problem is solved, but anyway I am still interested in how are you supposed to use the output of a LiftOver. Is the output file with the new coordinates the output that you directly use as annotation file to index the genome and subsequently map the reads?
What format is the input in? The output should be in the same format. LiftOver only does BED, but for most aligners, you need the annotation in GTF/GFF format.
Look into CrossMap, which supports a lot more input formats.