Commit 846f5f5c authored by tobinski's avatar tobinski
Browse files

Update readme

parent b92ae832
Pipeline #3971 passed with stage
in 1 minute and 36 seconds
......@@ -2,13 +2,13 @@
The API provides a gateway to the streaming platform. With a http interface users can get geoconcordances from different streaming apps.
## How it works
Unlike a classic api, this api never communicates with the different component itself. So this code will never send a query to a DB or another store. It forward the http request to a kafka topic (*-request) and wait in another topic (*-resposne) for possible responses. On the topic we will have different resolver app to fulfill the request. If the resolver not answer in a given timeframe we will response with a 404.
Unlike a classic rest-api, this api never communicates with the different component itself. So this code will never send a query to a DB or another store. It forward the http request to a kafka topic (*-request) and wait in another topic (*-resposne) for possible responses [request-response](https://www.confluent.io/blog/apache-kafka-for-service-architectures/). On the topic we will have different resolver app to fulfil the request. If the resolver not answer in a given timeframe (1000ms) we will response with a 404.
## Too complicate
This is a complicate way to query a DB or elastic search index, but it gives use flexibility and loose coupling. If we want to add a new algorithm we can do this without changing the api code. We can add a new subsystem without changing anything. It also scales more independently. If one subsystem is too slow we can just scale this subsystem. The future requirement will show us if the complexity is worse the effort.
## Idempotent
This implementation conflicts with the http idempotent specification. So there is no grantee to get the same answer for the same request. Under several circumstances we will get different answers:
This implementation conflicts with the [http idempotence](https://en.wikipedia.org/wiki/Idempotence) specification. So there is no grantee to get the same answer for the same request. Under several circumstances we will get different answers for the same request:
* If a processor app is crashed the answer will be different - mostly 404.
* If there is high load on the server the answer may be different.
......@@ -20,9 +20,9 @@ In general the api tries to deliver at least one valid answer.
Until now we offer three endpoints. More will follow
#### Config processors
You can configure the different processor to get differentiated results. Under the hood those arguments are forwarded to the processor app and those will use it to calculate the result.
You can configure the different resolver to get differentiated results. Under the hood those arguments are forwarded to the resolver app and those will use it to calculate the result.
### /v1/sameas/:node
To get manually or semi- manually linked entities out of the histHub-geolinker you can us the sameas endpoint. This endpoint offers you high quality links mostly done by humans and stored in the geolinker. This dataset includes links from our partners and links that we generated to some other servies like Geonames and Wikidata.
To get manually or semi- manually linked entities out of the histHub-geolinker you can us the `sameas` endpoint. Under the hood the [resolver-neo4j](https://source.dodis.ch/histhub/resolver-neo4j) will process those queries. This endpoint offers you high quality links mostly done by humans and stored in the geolinker. This dataset includes links from our partners and links that we generated over other services like Geonames and Wikidata.
In the underlying datastore (neo4j) each link is represented as an arrow between two nodes. E.g. URI1 (Project A) <-> URI2 (Project B). The API allows to fetch just those first level links. So you will get the links from and to your URI. You can also query the second and the third level of the network tpo get more links.
```bash
......@@ -39,11 +39,11 @@ Over the parameter `depth` you can define how many layers of hops you like to tr
curl -X GET https://api.geolinker.histhub.ch/v1/sameas/https://dodis.ch/G8?depth=2
```
The idea behind this network approach is simple. Research connect their entities with same entities in the network. The statement about the connection between two resources depends on the research question. In one context a statement is true, in another the same statement is 100% wrong. In a database about peace treaty the castle of Versailles is the same as city of Versailles. They connect the treaty with the geographical point of the city of versailles. For a research project about castles this statement is wrong. The city and the castle are not of the same type.
The geolinker ignores those conflicting satementes. It gives you the possiblity to link your resource with every other resource in the network. If it make sens for your reserach thats fine. So you can query all your own links with `depth=1`. Normaly you know the projects you link well and trust them. So you can get also all their links with `depth=1`. As further you traverse in the network the more unrelated links you will get, the more fuzzy is the result
The idea behind this network approach is simple. Researchers connects their entities with same entities in the network. The statement about the connection between two resources depends on the research question. In one context a statement is true, in another the same statement is 100% wrong. In a database about peace treaty the castle of Versailles is the same as city of Versailles. They connect the treaty with the geographical point of the city of versailles. For a research project about castles this statement is wrong. The city and the castle are not of the same type.
The geolinker ignores those conflicting statements. It gives you the possibility to link your resource with every other resource in the network. If it make sens for your research that's fine. So you can query all your own links with `depth=1`. Normally you know the projects you link well and trust them. So you can get also all their links with `depth=2`. As further you traverse in the network the more unrelated links you will get, the more fuzzy is the result
### v1/similarto/node
Most oten you wanna query the sameas endpoint. If you cant find the resource in the sameas endpoint you can try the similarto endpoint. While the sameas-API returns stable connections that are manual or semi-automatically generated, the similarto service returns resources that are connected automatically based on various criteria of similarity. Depending on the configuration of your request you will get different results. If you query the API you can get resources with a similar name in the specified area around the queried node.
Most often you wanna query the `sameas` endpoint. If you cant find the resource in the `sameas` endpoint you can try the `similarto` endpoint. While the sameas-API returns stable connections that are manual or semi-automatically generated, the similarto service returns resources that are connected automatically based on various criteria of similarity. Depending on the configuration of your request you will get different results. If you query the API you can get resources with a similar name in the specified area around the queried node.
```bash
# this will give us a list of automatically matched links for the resource https://dodis.ch/G300
......@@ -53,28 +53,26 @@ curl -X GET https://api.geolinker.histhub.ch/v1/similarto/https://dodis.ch/G300
You can specify the criteria for similarity over two parameter's. With `distance` you can define a maximal distance (in meter) between the queried node and a possible matched one. Normally the closer you find a similar resources to yours the higher is the chance of similarity.
Often names are written a bit different in two projects. With the parameter `fuzziness` you can define a [levenshtein distance](https://de.wikipedia.org/wiki/Levenshtein-Distanz) applied to the name of the resource. F.e. can you find `Hord` and `Nord` with a levenshtein distance of one.
```bash
# this will give us a list of automatically matched links for the resource https://dodis.ch/G300 with a maximal distance of 22km and a fuzziness of 2 to the queried node
# this will give us a list of automatically matched links for the resource https://dodis.ch/G300 with a maximal distance of 20km and a fuzziness of 2 to the queried node
https://api.geolinker.histhub.ch/v1/similarto/https://dodis.ch/G300?distance=20000&fuzziness=2
```
## Docker
To build the image use following command. The image will provide an api interface for kafka topics.
To build the image use following command. The image will provide an api interface for kafka topics. The container based on the [histub/ node-kafka-docker-base](https://source.dodis.ch/histhub/node-kafka-docker-base).
```bash
docker build -t source.dodis.ch:4577/histhub/api .
# Upload to the registry
docker push source.dodis.ch:4577/histhub/api
```
## CI/CD
We have a build pipline in gitlab. So manually building of the image is not longer necessary.
We have a build pipeline in gitlab. So manually building of the image is not longer necessary.
## Deploy to k8
In the deployment repository you can find the configutation to start a k8 pod
```bash
kubectl create -f api-deployment.yaml
```
This streaming app is part of the [geolinker helm chart's](https://source.dodis.ch/histhub/deploy-geolinker/blob/master/vars/api-values.yaml).
## Troubleshooting
* Please [urlencode](https://de.wikipedia.org/wiki/URL-Encoding) the url you query for. Otherwise you may get a 400 error
* If you recive 404 all the time one of the worker may crashed. Please contact us and we restart it. We currently working on the stability of the cluster
* If you receive 404 all the time one of the worker may crashed. Please contact us and we restart it. We currently working on the stability of the cluster
# Future
* We will provide a metadata endppoint to fetch strutured data about the resource
* We will create an endpoint where you can regoncile data
* We will provide a metadata endpoint to fetch structured data about the resource
* We will create an endpoint where you can reconcile data
* We will provide the name of the provider with the url
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment