duplicated or wrong data
We have some duplicated data in the index. This mainly results from different uri for the same resource. A resource may be available over http and https or with and without www. We need a way to normalize uri across the project or build them (and control them) during indexing.
This is an overview issue. We will handle those points in single issues and keep track of progress here
Affected Providers
-
alfred-escher (urlencoded urls and not urlencoded urls) f.e. Martino Pedrazzini 1843-01-01 1922-01-01 -
elitesuisse old wrong uris in the following format (https://www2.unil.ch/elitessuisses/index.php?page=detailPerso&idIdentite=http://www.hls-dhs-dss.ch/textes/f/F9056.php) -
viaf changed the numbers from a sort to a long form with redirect -
Fotostiftung with a trailingslash and without (https://www.fotostiftung.ch/de/nc/index-der-fotografinnen/fotografin/cumulus/1397/0/show// and https://www.fotostiftung.ch/de/nc/index-der-fotografinnen/fotografin/cumulus/1397/0/show/)