Commit d9547f92 authored by Lena Heizmann's avatar Lena Heizmann
Browse files

Update Readme.md

parent 6750037c
# wikidata library
The wikidata library provide a set of stream-transformers and utils to work with raw data from wikidata (dump and feed). Originally the normalizer was a pod running in k8, consumed and produced messages from kafka. While a refactoring we decided to use the transformers directly inside the dump-producer and the live-feed. So we can save the round trip to kafka and save some disk storage and $$$.
The wikidata library provides a set of stream-transformers and utils to work with raw data from wikidata (dump and feed). Originally the normalizer was a pod running in k8, consumed and produced messages from kafka. During a refactoring we decided to use the transformers directly inside the dump-producer and the live-feed. So we can save the round trip to kafka and save some disk storage and $$$.
## wikidata-normalizer-transformer
This transformer takes raw data from wikidata and normalize them to the geolinker default format. It then prepares a message for kafka
This transformer takes raw data from wikidata and normalizes them to the geolinker default format. It then prepares a message for kafka.
## wikidata-analyzer-transformer
This transformer analyse wikidata's raw data and extracts information about links between items. F.e we check for links to other interesting resources. We then extract those links and prepare them as a message for the linker
This transformer analyzes wikidata's raw data and extracts information about links between items. F.e we check for links to other interesting resources. We then extract those links and prepare them as a message for the linker.
## wikidata-geofilter-transformer
This transformers try to guess the type of a wikidata item. If its from a defined set of classes it forwards the message, if not it just dumps the message. We nuse it to filter out all documents related to geography (f.e. locations, places, cities and so on)
This transformers tries to guess the type of a wikidata item. If it's from a defined set of classes it forwards the message, if not it just dumps the message. We nuse it to filter out all documents related to geography (f.e. locations, places, cities and so on)
## wikidata-utils
Simple utils that help to work with wikidata raw format.
### timeToDate()
This function transforms the time value from wikidata into a date format used by the geolinker
This function transforms the time value from wikidata into a date format used by the geolinker.
### WikidataProperties.getProperties(property: string, query: any = {brief: true})
This method query the sparql endpoint of wikidata adn extracts properties from the result. We use it to get all classes and subclasses from location
This method queries the sparql endpoint of wikidata and extracts properties from the result. We use it to get all classes and subclasses from location.
### WikidataProperties.init(props: IProperty[])
This method get a list of all subclasses for a set of properties. F.e. you can find all properties expressing the "end" of something.
# Tests
The transformer are tested. The utils not
The transformers are tested. The utils aren't.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment