Change crawling process
The crawling, extracting and mering is very tangled and has some flaws.
-
When we save a concordance with DummyPerson in the storage, we don't know, if the identifier is correct. We just know the identifier from the document we extracting from. This leads to follow up problems. If we want to update the identifier later, we realize that there is a conflict with some the other resources in the storage, we can't merge together. In the end we have a duplicate in the storage -
We crawl a lot - too much. Most of the time we know the person already and don't need to refresh it