I would like to compare 2 large tables of objects from two different databases:
- an array of 2700 objects
- an array of 1800 objects
Each object is a scientific publication with 30 properties. The aim is to identify duplicate titles. But as the titles have not been entered in exactly the same way in the two databases, I want to make my comparison on the publication title using the Levenshtein distance implemented in this answer: Compare Strings Javascript Return %of Likely
I’ve tested two ways of doing this:
- using Map: I store each array in 2 different maps, using the publication title as the key. Then I loop through first map, then I loop through second map, and execute a Levenshtein test on the 2 keys.
- using only titles: I create 2 arrays of publication titles. Then I loop through first array, then I loop through second array, and execute a Levenshtein test on the 2 elements.
Both processes are extremely time-consuming. Do you think there’s a more efficient way of doing this?
Thank you very much for your answers.