Avoiding duplicate clients¶
Lino Welfare offers some functionality for avoiding duplicate
Client
records.
This page is a tested document and the following instructions are used for initialization:
>>> import lino
>>> lino.startup('lino_welfare.projects.gerd.settings.doctests')
>>> from lino.api.doctest import *
In Lino Welfare, a Client
inherits from DupableClient
.
Phonetic words¶
See lino.mixins.dupable.PhoneticWordBase
.
>>> rt.show(pcsw.CoachedClients, column_names="id name dupable_words")
...
===== ======================= =======================
ID Name dupable_words
----- ----------------------- -----------------------
116 Ausdemwald Alfons *ASTMLT*, *ALFNS*
177 Brecht Bernd *PRKT*, *PRNT*
118 Collard Charlotte *KLRT*, *XRLT*
124 Dobbelstein Dorothée *TPLSTN*, *TR0*
179 Dubois Robin *TP*, *RPN*
128 Emonts Daniel *AMNTS*, *TNL*
152 Emonts-Gast Erna *AMNTS*, *KST*, *ARN*
129 Engels Edgar *ANJLS*, *ATKR*
127 Evers Eberhart *AFRS*, *APRRT*
132 Groteclaes Gregory *KRTKLS*, *KRKR*
133 Hilgers Hildegard *HLKRS*, *HLTKRT*
137 Jacobs Jacqueline *JKPS*, *JKLN*
181 Jeanémart Jérôme *JNMRT*, *JRM*
139 Jonas Josef *JNS*, *JSF*
141 Kaivers Karl *KFRS*, *KRL*
178 Keller Karl *KLR*, *KRL*
142 Lambertz Guido *LMPRTS*, *KT*
144 Lazarus Line *LSRS*, *LN*
146 Malmendier Marc *MLMNT*, *MRK*
147 Meessen Melissa *MSN*, *MLS*
153 Radermacher Alfons *RTRMKR*, *ALFNS*
155 Radermacher Christian *RTRMKR*, *KRSXN*
157 Radermacher Edgard *RTRMKR*, *ATKRT*
159 Radermacher Guido *RTRMKR*, *KT*
161 Radermacher Hedi *RTRMKR*, *HT*
173 Radermecker Rik *RTRMKR*, *RK*
165 da Vinci David *FNS*, *TFT*
166 van Veen Vincent *FN*, *FNSNT*
168 Östges Otto *ASTJS*, *AT*
===== ======================= =======================
Similar Clients¶
The test database contains a fictive person named Dorothée Dobbelstein-Demeulenaere as an example of accidental duplicate data entry. Dorothée exists 3 times in our database:
>>> for p in pcsw.Client.objects.filter(name__contains="Dorothée"):
... print(str(p))
...
DEMEULENAERE Dorothée (122)
DOBBELSTEIN-DEMEULENAERE Dorothée (123*)
DOBBELSTEIN Dorothée (124)
The detail window of each of these records shows some of the other records in the SimilarClients table:
>>> translation.activate("en")
>>> rt.show(dupable_clients.SimilarClients, pcsw.Client.objects.get(pk=122))
*DOBBELSTEIN-DEMEULENAERE Dorothée (123*)* Phonetic words: TMLNR, TR0
>>> rt.show(dupable_clients.SimilarClients, pcsw.Client.objects.get(pk=123))
...
*DEMEULENAERE Dorothée (122)* *DOBBELSTEIN Dorothée (124)* Phonetic words: TPLSTN, TMLNR, TR0
>>> rt.show(dupable_clients.SimilarClients, pcsw.Client.objects.get(pk=124))
...
*DOBBELSTEIN-DEMEULENAERE Dorothée (123*)* Phonetic words: TPLSTN, TR0
Note how the result can differ depending on the partner. Our algorithm is not perfect and does not detect all duplicates.
Checked at input¶
If a user tries to create a fourth record of that person, then Lino will ask a confirmation first:
>>> data = dict(an="submit_insert")
>>> data.update(first_name="Dorothée")
>>> data.update(last_name="Dobbelstein")
>>> data.update(genderHidden="F")
>>> data.update(gender="Weiblich")
>>> test_client.force_login(rt.login('robin').user)
>>> res = test_client.post('/api/pcsw/Clients', data=data, REMOTE_USER="robin")
>>> res.status_code
200
>>> r = json.loads(res.content)
>>> print(r['message'])
There are 2 similar Clients:<br/>
DOBBELSTEIN-DEMEULENAERE Dorothée (123*)<br/>
DOBBELSTEIN Dorothée (124)<br/>
Are you sure you want to create a new Client named Mrs Dorothée DOBBELSTEIN?
This is because lino.mixins.dupable.Dupable
replaces
the standard submit_insert action by the CheckedSubmitInsert
action.
The algorithm¶
The alarm bell rings when there are two similar name components in both first and last name. Punctuation characters (like “-” or “&” or “,”) are ignored, and also the ordering of elements does not matter.
The current implementation splits the name
of each client into its parts,
removing punctuation characters, computes a phonetic version using the
NYSIIS algorithm
and stores them in a separate database table.
How good (how bad) is our algorithm? See the source code of lino.projects.min2.tests.test_min2.