Comparing two lists of words didn't worked for me either, so what I did was to convert words into Chinese characters (there are more than 20,000 of them in Unicode), aligning the sequences as character strings, and then back to Latin-alphabet words again. Works like a charm:
from bio import pairwise2
from bio.pairwise2 import format_alignment
LISTA=["alors","en","fait","depuis","novembre","2017","du","coup",",","j'","ai","fait",",","plusieurs","ce","que","j'","appelle","des","crises",",","en","fait","c'","est",",","pendant",",","une","semaine",",","entre","une","semaine","et","10","jours",",","je","me","sens","un","peu","comme",",","déconnectée","de","la","réalité","un","peu",",","j'","arrive","pas","à",",","j'","arrivais","plus","à","faire","la","différence","entre",",","entre","si","j'","étais","dans","un","ou","si","j'","étais","dans","la","réalité","."]
LISTB=["alors","en","fait","depuis","euh","novembre","deux","mille","dix-sept","du","coup","j'","ai","fait","euh","hum","hum","plusieurs","euh","enfin","ce","que","j'","appelle","des","crises","en","fait","c'","est","euh","pendant","euh","une","semaine","entre","une","semaine","et","dix","jours","euh","je","me","sens","un","peu","comme","euh","déconnectée","de","la","réalité","un","peu","j'","arrive","pas","à","j'","arrivais","plus","à","faire","la","différence","entre","entre","si","j'","étais","dans","un","rêve","ou","si","j'","étais","dans","la","réalité"]
charcode=ord(u"一")-1
LATtoHAN={}
HANtoLAT={}
LISTA_=[]
LISTB_=[]
for x in LISTA:
if x in LATtoHAN.keys():
LISTA_.append(LATtoHAN[x])
else:
charcode+=1
LATtoHAN[x]=chr(charcode)
HANtoLAT[chr(charcode)]=x
LISTA_.append(LATtoHAN[x])
for x in LISTB:
if x in LATtoHAN.keys():
LISTB_.append(LATtoHAN[x])
else:
charcode+=1
LATtoHAN[x]=chr(charcode)
HANtoLAT[chr(charcode)]=x
LISTB_.append(LATtoHAN[x])
LISTA__="".join(LISTA_)
LISTB__="".join(LISTB_)
alignments=pairwise2.align.globalxx(LISTA__,LISTB__)
RESA=[]
RESB=[]
for w in alignments[0][0]:
if (w == "-"):
RESA.append("-")
else:
RESA.append(HANtoLAT[w])
for w in alignments[0][1]:
if (w == "-"):
RESB.append("-")
else:
RESB.append(HANtoLAT[w])
print(RESA,RESB)
Please reformat your code using the 101010 button or by putting 4 spaces before each line of code. This is particularly important for python as you'll need to correctly indent the code so that we can be sure it's written correctly.
Thx For the feedback
As said by jrj.healey, code formatting is very important. I now added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Thx For the feedback