Commit 3a0cbab0 authored by Tobias Steiner's avatar Tobias Steiner
Browse files

Update readme

parent a1f7b8ab
# wikidata-normalizer
This streaming app consumes data from a wikidata kafka topic and normalized them tto a common schema. The app decouples the producing of the stream from the normalizing process.
This streaming app consumes data from the wikidata-geo topic normalized and analysed them to a common schema. The app decouples the producing of the stream from the normalizing and analysing process.
## Normalizer
The Readable stream has two Writable stream subscribed. One for Analysing the data and build links and one for normalising the data a common schema. The Normaliser sends the data to the topic `wikidata-small`
## Analyser
The analyser extracts specific wikidata properties and send a concordance of links to the `linker` topic
## Docker
To build the image use following command. The image will fetch data from a wikidata topic and streams the result back into kafka. The container based on linux alpine.
```bash
......@@ -14,8 +17,4 @@ We hav a build pipline in gitlab. So manually building of the image is not longe
We execute a job on k8 to stream the dump into kafka
```bash
kubectl create -f wikidata-normalizer-deployment.yaml
```
## Todo
* Refactor the kafka-producer to a common module. So we can use it elsewhere
* Refactor the stream tramsformer to a common module. So we can use it elsewhere
\ No newline at end of file
```
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment