Commit de724823 authored by Lena Heizmann's avatar Lena Heizmann
Browse files

Update Readme.md

parent 846f5f5c
Pipeline #3973 passed with stage
in 1 minute and 22 seconds
# Api
The API provides a gateway to the streaming platform. With a http interface users can get geoconcordances from different streaming apps.
The API provides a gateway to the streaming platform of the histHub-Net [Geolinker](https://histhub.ch/doku-linker/). With an http interface users can get geoconcordances from different streaming apps.
## How it works
Unlike a classic rest-api, this api never communicates with the different component itself. So this code will never send a query to a DB or another store. It forward the http request to a kafka topic (*-request) and wait in another topic (*-resposne) for possible responses [request-response](https://www.confluent.io/blog/apache-kafka-for-service-architectures/). On the topic we will have different resolver app to fulfil the request. If the resolver not answer in a given timeframe (1000ms) we will response with a 404.
Unlike a classic rest-api, this api never communicates with the different components itself. So this code will never send a query to a DB or another store. It forwards the http request to a kafka topic (*-request) and wait in another topic (*-resposne) for possible responses [request-response](https://www.confluent.io/blog/apache-kafka-for-service-architectures/). On the topic we will have different resolver apps to fulfill the request. If the resolver doesn't answer in a given timeframe (1000ms) we will respond with a 404.
## Too complicate
This is a complicate way to query a DB or elastic search index, but it gives use flexibility and loose coupling. If we want to add a new algorithm we can do this without changing the api code. We can add a new subsystem without changing anything. It also scales more independently. If one subsystem is too slow we can just scale this subsystem. The future requirement will show us if the complexity is worse the effort.
## Too complicated
This is a complicated way to query a DB or elastic search index, but it gives us flexibility and loose coupling. If we want to add a new algorithm we can do this without changing the api code. We can add a new subsystem without changing anything. It also scales more independently. If one subsystem is too slow we can just scale this subsystem. The future requirement will show us if the complexity is worth the effort.
## Idempotent
This implementation conflicts with the [http idempotence](https://en.wikipedia.org/wiki/Idempotence) specification. So there is no grantee to get the same answer for the same request. Under several circumstances we will get different answers for the same request:
This implementation conflicts with the [http idempotence](https://en.wikipedia.org/wiki/Idempotence) specification. So there is no guarantee to get the same answer for the same request. Under several circumstances we will get different answers for the same request:
* If a processor app is crashed the answer will be different - mostly 404.
* If there is high load on the server the answer may be different.
......@@ -17,13 +17,13 @@ This implementation conflicts with the [http idempotence](https://en.wikipedia.o
In general the api tries to deliver at least one valid answer.
## Endpoints
Until now we offer three endpoints. More will follow
Until now we offer three endpoints. More will follow.
#### Config processors
You can configure the different resolver to get differentiated results. Under the hood those arguments are forwarded to the resolver app and those will use it to calculate the result.
You can configure the different resolvers to get differentiated results. Under the hood those arguments are forwarded to the resolver app that will then use it to calculate the result.
### /v1/sameas/:node
To get manually or semi- manually linked entities out of the histHub-geolinker you can us the `sameas` endpoint. Under the hood the [resolver-neo4j](https://source.dodis.ch/histhub/resolver-neo4j) will process those queries. This endpoint offers you high quality links mostly done by humans and stored in the geolinker. This dataset includes links from our partners and links that we generated over other services like Geonames and Wikidata.
In the underlying datastore (neo4j) each link is represented as an arrow between two nodes. E.g. URI1 (Project A) <-> URI2 (Project B). The API allows to fetch just those first level links. So you will get the links from and to your URI. You can also query the second and the third level of the network tpo get more links.
To get manually or semi- manually linked entities out of the histHub-geolinker you can us the `sameas` endpoint. Under the hood the [resolver-neo4j](https://source.dodis.ch/histhub/resolver-neo4j) will process those queries. This endpoint offers high quality links mostly done by humans and stored in the geolinker. This dataset includes links from our partners and links that we generated over other services like Geonames and Wikidata.
In the underlying datastore (neo4j) each link is represented as an arrow between two nodes. E.g. URI1 (Project A) <-> URI2 (Project B). The API allows to fetch just those first level links. So you will get the links from and to your URI. You can also query the second and the third level of the network to get more links.
```bash
# this will give us a list of concordances for the resource https://dodis.ch/G8
......@@ -39,38 +39,38 @@ Over the parameter `depth` you can define how many layers of hops you like to tr
curl -X GET https://api.geolinker.histhub.ch/v1/sameas/https://dodis.ch/G8?depth=2
```
The idea behind this network approach is simple. Researchers connects their entities with same entities in the network. The statement about the connection between two resources depends on the research question. In one context a statement is true, in another the same statement is 100% wrong. In a database about peace treaty the castle of Versailles is the same as city of Versailles. They connect the treaty with the geographical point of the city of versailles. For a research project about castles this statement is wrong. The city and the castle are not of the same type.
The geolinker ignores those conflicting statements. It gives you the possibility to link your resource with every other resource in the network. If it make sens for your research that's fine. So you can query all your own links with `depth=1`. Normally you know the projects you link well and trust them. So you can get also all their links with `depth=2`. As further you traverse in the network the more unrelated links you will get, the more fuzzy is the result
The idea behind this network approach is simple. Researchers connects their entities with same entities in the network. The statement about the connection between two resources depends on the research question. In one context a statement is true, in another the same statement is 100% wrong. In a database about peace treaties the castle of Versailles is the same as city of Versailles. They connect the treaty with the geographical point of the city of Versailles. For a research project about castles this statement is wrong. The city and the castle are not of the same type.
The geolinker ignores those conflicting statements. It gives you the possibility to link your resource with every other resource in the network. If it makes sense for your research that's fine. So you can query all your own links with `depth=1`. Normally you know the projects you link to well and you trust them. So you can also get all their links with `depth=2`. As further you traverse in the network the more unrelated links you will get, the more fuzzy is the result.
### v1/similarto/node
Most often you wanna query the `sameas` endpoint. If you cant find the resource in the `sameas` endpoint you can try the `similarto` endpoint. While the sameas-API returns stable connections that are manual or semi-automatically generated, the similarto service returns resources that are connected automatically based on various criteria of similarity. Depending on the configuration of your request you will get different results. If you query the API you can get resources with a similar name in the specified area around the queried node.
Most often you wanna query the `sameas` endpoint. If you cant find the resource in the `sameas` endpoint you can try the `similarto` endpoint. While the sameas-API returns stable connections that are manually or semi-automatically generated, the similarto service returns resources that are connected automatically based on various criteria of similarity. Depending on the configuration of your request you will get different results. If you query the API you can get resources with a similar name in the specified area around the queried node.
```bash
# this will give us a list of automatically matched links for the resource https://dodis.ch/G300
curl -X GET https://api.geolinker.histhub.ch/v1/similarto/https://dodis.ch/G300
```
You can specify the criteria for similarity over two parameter's. With `distance` you can define a maximal distance (in meter) between the queried node and a possible matched one. Normally the closer you find a similar resources to yours the higher is the chance of similarity.
You can specify the criteria for similarity over two parameters. With `distance` you can define a maximal distance (in meter) between the queried node and a possible matched one. Normally, the closer you find a similar resources to yours the higher is the chance of similarity.
Often names are written a bit different in two projects. With the parameter `fuzziness` you can define a [levenshtein distance](https://de.wikipedia.org/wiki/Levenshtein-Distanz) applied to the name of the resource. F.e. can you find `Hord` and `Nord` with a levenshtein distance of one.
```bash
# this will give us a list of automatically matched links for the resource https://dodis.ch/G300 with a maximal distance of 20km and a fuzziness of 2 to the queried node
https://api.geolinker.histhub.ch/v1/similarto/https://dodis.ch/G300?distance=20000&fuzziness=2
```
## Docker
To build the image use following command. The image will provide an api interface for kafka topics. The container based on the [histub/ node-kafka-docker-base](https://source.dodis.ch/histhub/node-kafka-docker-base).
To build the image use following command. The image will provide an api interface for kafka topics. The container is based on the [histub/ node-kafka-docker-base](https://source.dodis.ch/histhub/node-kafka-docker-base).
```bash
docker build -t source.dodis.ch:4577/histhub/api .
# Upload to the registry
docker push source.dodis.ch:4577/histhub/api
```
## CI/CD
We have a build pipeline in gitlab. So manually building of the image is not longer necessary.
We have a build pipeline in gitlab. So manually building the image is no longer necessary.
## Deploy to k8
This streaming app is part of the [geolinker helm chart's](https://source.dodis.ch/histhub/deploy-geolinker/blob/master/vars/api-values.yaml).
## Troubleshooting
* Please [urlencode](https://de.wikipedia.org/wiki/URL-Encoding) the url you query for. Otherwise you may get a 400 error
* If you receive 404 all the time one of the worker may crashed. Please contact us and we restart it. We currently working on the stability of the cluster
* If you receive 404 all the time one of the workers may have crashed. Please contact us and we restart it. We are currently working on the stability of the cluster.
# Future
* We will provide a metadata endpoint to fetch structured data about the resource
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment