Commit bb119a89 authored by tobinski's avatar tobinski
Browse files

Update readme

parent 533216aa
# wikidata library
The wikidata library provide a set of stream-transformers and utils to work with raw data from wikidata (dump and feed). Originally the normalizer was a pod running in k8, consumed and produced messages from kafka. While a refactoring we decided to use the transformations directly inside the dump-producer and the live-feed. So we can save the round trip to kafka and save some disk storage.
The wikidata library provide a set of stream-transformers and utils to work with raw data from wikidata (dump and feed). Originally the normalizer was a pod running in k8, consumed and produced messages from kafka. While a refactoring we decided to use the transformers directly inside the dump-producer and the live-feed. So we can save the round trip to kafka and save some disk storage and $$$.
## wikidata-normalizer-transformer
This transformer takes raw data from wikidata and normalize them to the geolinker default format. It then prepares the data to snd it to kafka
This transformer takes raw data from wikidata and normalize them to the geolinker default format. It then prepares a message for kafka
## wikidata-analyzer-transformer
This transformer analyse wikidata's raw data and extracts information about links between items. F.e we check for links to other interesting resources. We then extract those links and prepare them as a message for the linker
## wikidata-time-to-date-object
This script transforms the date from wikidata into a formate used by the geolinker
## wikidata-geofilter-transformer
This transformers try to guess the type of a wikidata item. If its from a defined set of classes it forwards the message, if not it just dumps the message. We nuse it to filter out all documents related to geography (f.e. locations, places, cities and so on)
## wikidata-utils
Simple utils used to grab a set of properties from wikidata or to transform time statements
Simple utils that help to work with wikidata raw format.
### timeToDate()
This function transforms the time value from wikidata into a date format used by the geolinker
### WikidataProperties.getProperties(property: string, query: any = {brief: true})
This method query the sparql endpoint of wikidata adn extracts properties from the result. We use it to get all classes and subclasses from location
### WikidataProperties.init(props: IProperty[])
This method get a list of all subclasses for a set of properties. F.e. you can find all properties expressing the "end" of something.
# Tests
There are some tests, but just few. We should add more tests!
The transformer are tested. The utils not
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment