I am using jcvi to do collinearity analysis and the following problem occurs: ValueError: A total of 0 anchor was found. Aborted
.According to the search results on the web page:
Make sure that the gene names in the CDS FASTA header match the gene names in the BED file (4th column); otherwise, the LAST output will not get imported correctly.
I tried to make the sequence names in the cds file match the fourth column of the bed file. Here is the modified header of the cds file:
> GRMZM5G892247.RefGen_V4
> GRMZM5G891801.RefGen_V4
> GRMZM5G890747.RefGen_V4
> GRMZM5G890451.RefGen_V4
> GRMZM5G889905.RefGen_V4
> GRMZM5G889790.RefGen_V4
> GRMZM5G889299.RefGen_V4
> GRMZM5G889036.RefGen_V4
> GRMZM5G887911.RefGen_V4
> GRMZM5G887290.RefGen_V4
> GRMZM5G885905.RefGen_V4
> GRMZM5G884960.RefGen_V4
Below is part of the bed file:
1 44288 49837 Zm00001d027230.RefGen_V4 0 +
1 50876 55716 Zm00001d027231.RefGen_V4 0 -
1 92298 95134 Zm00001d027232.RefGen_V4 0 -
1 111654 118312 Zm00001d027233.RefGen_V4 0 -
1 118682 119739 Zm00001d027234.RefGen_V4 0 -
1 122119 122614 Zm00001d027235.RefGen_V4 0 +
1 138377 139043 Zm00001d027236.RefGen_V4 0 -
1 196941 199159 Zm00001d027239.RefGen_V4 0 -
1 199344 205715 Zm00001d027240.RefGen_V4 0 -
1 209978 215403 Zm00001d027242.RefGen_V4 0 -
1 217903 219526 Zm00001d027244.RefGen_V4 0 -
I found that many CDS sequence names do not exist in the BED file! CDS and GFF files are downloaded from Phytozome, and BED files are generated based on JCVI. My question is: should all sequences in CDS files that do not exist in BED files be discarded? Will it affect the results of collinearity analysis?
What does that mean? Is that a program? If so please post the entire command line you are using.
In general, the error message seems pretty clear about what is happening (e.g. mismatch in ID).