Fuseki SPARQL server: Difference between revisions

From artserver wiki
m (Text replacement - "Code_Notes" to "Code Notes")
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{#set: Section=Code Notes|
Date=2020}}
For the purpose of this tutorial I will use the database name <code>test</code>
For the purpose of this tutorial I will use the database name <code>test</code>
=requirements =
* Java Development Kit:
** <code>apt-get install default-jdk</code>


=Run Fuseki as a systemd service=
=Run Fuseki as a systemd service=
As root got
As root go to
  cd /usr/local/src
  cd /usr/local/src


Line 9: Line 16:
  tar xfvz apache-jena-fuseki-3.15.0.tar.gz
  tar xfvz apache-jena-fuseki-3.15.0.tar.gz
  cd apache-jena-fuseki-3.15.0
  cd apache-jena-fuseki-3.15.0
Try running fuseki-server:
./fuseki-server
* it will create the <code>run/</code> directory, with config files, dataset and backups directories
* it will not run in JDK is not installed


==Fuseki  File Layout==
==Fuseki  File Layout==
Line 20: Line 32:


* '''FUSEKI_HOME(Distribution area)''' –  a  is essentially the fuseki-server binary and a few helper scripts
* '''FUSEKI_HOME(Distribution area)''' –  a  is essentially the fuseki-server binary and a few helper scripts
* '''FUSEKI_BASE(Runtime area)''' – is a directory that contains the configuration, dbs, logs - which should be backup and not changed with updates of the Fuseki binaries.
* '''FUSEKI_BASE(Runtime area)''' – is a directory that contains the configuration, datasets, logs - which should be backup and not changed with updates of the Fuseki binaries.


So let's go ahead and create those directories and move the corresponding files to the right dir
So let's go ahead and create those directories and move the corresponding files to the right dir
Line 26: Line 38:
  mkdir /etc/fuseki
  mkdir /etc/fuseki


  cp {fuseki,fuseki-server,fuseki-server.bat,fuseki-server.jar,fuseki.war,bin,webapp} /usr/share/fuseki/
  cp -r {fuseki,fuseki-server,fuseki-server.bat,fuseki-server.jar,fuseki.war,bin,webapp} /usr/share/fuseki/
  cp -r run/* /etc/fuseki/
  cp -r run/* /etc/fuseki/
cp log4j2.properties /etc/fuseki/
<strike>cp log4j2.properties /etc/fuseki/</strike>


And we can make a test run by running:
And we can make a test run by running:
  /usr/share/fuseki/fuseki-server
  /usr/share/fuseki/fuseki-server
And checking the if the server is up by visiting http://localhost:3030/index.html
And checking the if the server is up by visiting http://localhost:3030/index.html


==Service file==
==Service file==
Line 82: Line 93:


Create logs dir:
Create logs dir:
  mkdir /var/log/
  mkdir /var/log/fuseki/


Enable and run the service:
Enable and run the service:
Line 92: Line 103:


And again check its web UI at http://localhost:3030
And again check its web UI at http://localhost:3030
* you try to create a db a make and perform and INSERT statement, which be stored in <code>/etc/fuseki/databases/databasename/</code>
 
You can try to create a dataset and perform an INSERT statement, which be stored in <code>/etc/fuseki/databases/databasename/</code>






==Security==  
==Security==  
Fuseki security settings are defined in file /etc/fuseki/shiro.ini <ref>Fhttps://jena.apache.org/documentation/fuseki2/fuseki-security.html</ref>
Fuseki security settings are defined in file /etc/fuseki/shiro.ini <ref>https://jena.apache.org/documentation/fuseki2/fuseki-security.html</ref>


Here I can change the default admin password
Here I can change the default admin password
Line 105: Line 117:


And also the rights to Fuseki Administative HTTP protocol<ref>https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html</ref>
And also the rights to Fuseki Administative HTTP protocol<ref>https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html</ref>
and access to the existing DBs.
and access to the existing datasets.


I will allow the db <code>test</code> to be queried by anyone but only updated by the locahost, so that the SMW can write to it, but requests coming from outside cannot update it, but can query, with:
I will allow the dataset <code>test</code> to be queried by anyone but only updated by the locahost, so that the SMW can write to it, but requests coming from outside cannot update it, but can query, with:
    
    
  /test/query  = anon
  /test/query  = anon
Line 116: Line 128:


<pre>
<pre>
# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0


[main]
[main]
Line 140: Line 151:
/$/stats/**  = anon
/$/stats/**  = anon


# test db
# oooowiki dataset
/test/query  = anon
/test/query  = anon
/test/update  = localhostFilter
/test/update  = localhostFilter


# everything else only accessible to localhost
# everything else is accessible to admin
/** = localhostFilter
/** = authcBasic,user[admin]


</pre>
</pre>
Line 152: Line 163:
To test we should restart fuseki <code>systemctl restart fuseki</code> and run a few requests, from both localhost and external host:
To test we should restart fuseki <code>systemctl restart fuseki</code> and run a few requests, from both localhost and external host:
   
   
curl http://localhost:3030/$/status -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'
A query coming from both local and remote hosts, where both should succeed:
A query coming from both local and remote hosts, where both should succeed:


Line 164: Line 177:


  curl http://10.0.20.2:3030/test -X POST --data 'update=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0AINSERT+DATA%0A%7B%0A+country%3Aooo+foaf%3Aname+%22OOOOO%22%40en+.%0A%7D' -H 'Accept: text/plain,*/*;q=0.9'
  curl http://10.0.20.2:3030/test -X POST --data 'update=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0AINSERT+DATA%0A%7B%0A+country%3Aooo+foaf%3Aname+%22OOOOO%22%40en+.%0A%7D' -H 'Accept: text/plain,*/*;q=0.9'
==webserver proxy==
Instead of calling <code>localhost:3030</code> to access fuseki, we I have added an new virtual host to my domain apache config, to allow Fuseki to be accessed via the, subdomain sparql.oooooooooo.io,
<source lang="conf">
<VirtualHost *:80>
    ServerName sparql.oooooooooo.io
    ProxyRequests Off
    ProxyPass / http://your.server.IP:3030/
    ProxyPassReverse /  http://your.server.IP:3030
</VirtualHost>
</source>
'''
Note: Although it would be possible to do the the ProxyPass to the localhost (http://127.0.0.1:3030), this would make Fuseki ''extremely vunerable'', as the webserver would turn every request to fuseki into a request coming from from the LocalHost, making shiro.ini <code>localhostFilter</code> useless.'''
Reload apache site conf
And test:
curl http://sparql.oooooooooo.io/$/status -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'


==Logging==
==Logging==
Line 180: Line 216:
  curl http://localhost:3030/$/backup/test -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'  
  curl http://localhost:3030/$/backup/test -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'  


Here the test db will be backup up, and '''stored in <code>/etc/fuseki/backups</code>''' under a gzip-compressed N-Quads file <code>test_2020-05-23_14-42-41.nq.gz</code>
Here the test dataset will be backup up, and '''stored in <code>/etc/fuseki/backups</code>''' under a gzip-compressed N-Quads file <code>test_2020-05-23_14-42-41.nq.gz</code>


Which, when decompressed, will show a contained list of triples which makes up the db:
Which, when decompressed, will show a contained list of triples which makes up the dataset:
  gzip -d test_2020-05-23_14-42-41.nq.gz
  gzip -d test_2020-05-23_14-42-41.nq.gz



Latest revision as of 13:53, 25 August 2022


For the purpose of this tutorial I will use the database name test

requirements

  • Java Development Kit:
    • apt-get install default-jdk

Run Fuseki as a systemd service

As root go to

cd /usr/local/src

Download & untar

wget https://apache.redkiwi.nl/jena/binaries/apache-jena-fuseki-3.15.0.tar.gz
tar xfvz apache-jena-fuseki-3.15.0.tar.gz
cd apache-jena-fuseki-3.15.0

Try running fuseki-server:

./fuseki-server
  • it will create the run/ directory, with config files, dataset and backups directories
  • it will not run in JDK is not installed

Fuseki File Layout

I will follow the Filesystem layout suggested by the official documentation for [1] for running Fuseki as a service

Environment Variable 	Default Setting
FUSEKI_HOME 	        /usr/share/fuseki
FUSEKI_BASE 	        /etc/fuseki
  • FUSEKI_HOME(Distribution area) – a is essentially the fuseki-server binary and a few helper scripts
  • FUSEKI_BASE(Runtime area) – is a directory that contains the configuration, datasets, logs - which should be backup and not changed with updates of the Fuseki binaries.

So let's go ahead and create those directories and move the corresponding files to the right dir

mkdir /usr/share/fuseki
mkdir /etc/fuseki
cp -r {fuseki,fuseki-server,fuseki-server.bat,fuseki-server.jar,fuseki.war,bin,webapp} /usr/share/fuseki/
cp -r run/* /etc/fuseki/

cp log4j2.properties /etc/fuseki/

And we can make a test run by running:

/usr/share/fuseki/fuseki-server

And checking the if the server is up by visiting http://localhost:3030/index.html

Service file

Inside the untared dir /usr/local/src/apache-jena-fuseki-3.15.0 you can find the file fuseki.service

This file should be copied to /etc/system.d/system and edited in order to run Fuseki as a service. The file itself is quite self explanatory. I have added a few changes, mainly in relation to logging.


cp fuseki.service /etc/systemd/system
vi /etc/systemd/system/fuseki.service
[Unit]
Description=Fuseki

[Service]
# Edit environment variables to match your installation
Environment=FUSEKI_HOME=/usr/share/fuseki
Environment=FUSEKI_BASE=/etc/fuseki
# Edit the line below to adjust the amount of memory allocated to Fuseki
Environment=JVM_ARGS=-Xmx4G
# Edit to match your installation
ExecStart=/usr/share/fuseki/fuseki-server
# Run as user "fuseki"
User=root
Restart=on-abort
# Java processes exit with status 143 when terminated by SIGTERM, this
# should be considered a successful shutdown
SuccessExitStatus=143
### By default, the service logs to journalctl only.
StandardOutput=file:/var/log/fuseki/access.log
StandardError=file:/var/log/fuseki/stderrout.log
#StandardOutput=syslog
#StandardError=syslog
#SyslogIdentifier=fuseki
### This logs to syslog. If, e.g., rsyslogd is used, you can provide a file
### /etc/rsyslog.d/fuseki.conf, consisting of the following two lines (uncommented)
#if $programname == 'fuseki' then /var/log/fuseki/stderrout.log
#if $programname == 'fuseki' then stop


[Install]
WantedBy=multi-user.target


Create logs dir:

mkdir /var/log/fuseki/

Enable and run the service:

systemctl enable fuseki
systemctl start fuseki

Check it's status

systemctl status fuseki

And again check its web UI at http://localhost:3030

You can try to create a dataset and perform an INSERT statement, which be stored in /etc/fuseki/databases/databasename/


Security

Fuseki security settings are defined in file /etc/fuseki/shiro.ini [2]

Here I can change the default admin password

Note:

  • ensure the admin password is something strong

And also the rights to Fuseki Administative HTTP protocol[3] and access to the existing datasets.

I will allow the dataset test to be queried by anyone but only updated by the locahost, so that the SMW can write to it, but requests coming from outside cannot update it, but can query, with:

/test/query  = anon
/test/update  = localhostFilter


Full shiro.ini


[main]
ssl.enabled = false 

plainMatcher=org.apache.shiro.authc.credential.SimpleCredentialsMatcher
#iniRealm=org.apache.shiro.realm.text.IniRealm 
iniRealm.credentialsMatcher = $plainMatcher

localhostFilter=org.apache.jena.fuseki.authz.LocalhostFilter

[users]
# Implicitly adds "iniRealm =  org.apache.shiro.realm.text.IniRealm"
admin=pw123

[roles]

[urls]

# All admin operations have URL paths starting /$/ to avoid clashes with dataset names and this prefix is reserved for the Fuseki control functions.
/$/status = anon
/$/ping   = anon
/$/stats/**  = anon

# oooowiki dataset
/test/query  = anon
/test/update  = localhostFilter

# everything else is accessible to admin
/** = authcBasic,user[admin]


To test we should restart fuseki systemctl restart fuseki and run a few requests, from both localhost and external host:

curl http://localhost:3030/$/status -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'

A query coming from both local and remote hosts, where both should succeed:

curl http://localhost:3030/test/query -X POST --data 'query=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0ASELECT+*%0A%7B%0A+%3Fs+foaf%3Aname+%22Test%22%40en+.%0A%7D' -H 'Accept: application/sparql-results+json,*/*;q=0.9'
curl http://10.0.0.20:3030/test/query -X POST --data 'query=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0ASELECT+*%0A%7B%0A+%3Fs+foaf%3Aname+%22Test%22%40en+.%0A%7D' -H 'Accept: application/sparql-results+json,*/*;q=0.9'


An update, which should succeed only when coming from localhost, and fail when coming from remote

curl http://localhost:3030/test -X POST --data 'update=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0AINSERT+DATA%0A%7B%0A+country%3Aooo+foaf%3Aname+%22OOOOO%22%40en+.%0A%7D' -H 'Accept: text/plain,*/*;q=0.9'
curl http://10.0.20.2:3030/test -X POST --data 'update=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0APREFIX+country%3A+%3Chttp%3A%2F%2Feulersharp.sourceforge.net%2F2003%2F03swap%2Fcountries%23%3E%0AINSERT+DATA%0A%7B%0A+country%3Aooo+foaf%3Aname+%22OOOOO%22%40en+.%0A%7D' -H 'Accept: text/plain,*/*;q=0.9'

webserver proxy

Instead of calling localhost:3030 to access fuseki, we I have added an new virtual host to my domain apache config, to allow Fuseki to be accessed via the, subdomain sparql.oooooooooo.io,

<VirtualHost *:80>
    ServerName sparql.oooooooooo.io
    ProxyRequests Off
 
    ProxyPass / http://your.server.IP:3030/
    ProxyPassReverse /  http://your.server.IP:3030
</VirtualHost>

Note: Although it would be possible to do the the ProxyPass to the localhost (http://127.0.0.1:3030), this would make Fuseki extremely vunerable, as the webserver would turn every request to fuseki into a request coming from from the LocalHost, making shiro.ini localhostFilter useless.

Reload apache site conf

And test:

curl http://sparql.oooooooooo.io/$/status -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9'

Logging

Apache Fuseki Logging is performed via SLF4J over Apache Log4J2.

The Fuseki engine looks for the log4j2 configuration, in different locations, but I will stick with:

  • file log4j2.properties in the directory defined by FUSEKI_BASE: /etc/fuseki

log4j2.properties was in previous steps copied to /etc/fuseki/ and includes the default logging settings.


Backups

It is possible to send a POST request for Fuseki to create a backup of a database, with

curl http://localhost:3030/$/backup/test -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9' 

Here the test dataset will be backup up, and stored in /etc/fuseki/backups under a gzip-compressed N-Quads file test_2020-05-23_14-42-41.nq.gz

Which, when decompressed, will show a contained list of triples which makes up the dataset:

gzip -d test_2020-05-23_14-42-41.nq.gz
cat test_2020-05-23_14-42-41.nq
<http://eulersharp.sourceforge.net/2003/03swap/countries#zx> <http://xmlns.com/foaf/0.1/name> "Zexix"@en .
<http://eulersharp.sourceforge.net/2003/03swap/countries#zy> <http://xmlns.com/foaf/0.1/name> "Zyz"@en .
<http://eulersharp.sourceforge.net/2003/03swap/countries#ou> <http://xmlns.com/foaf/0.1/name> "Ouoaoo"@en .
...

/$/backups-list will show the list of existing backups:

curl http://localhost:3030/$/backups-list -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9' 
{ 
  "backups" : [ 
      "test_2020-05-23_14-42-41.nq" ,
      "test_2020-05-23_14-42-46.nq.gz"
    ]
}


Notes

... more about "Fuseki SPARQL server"
Code Notes +
Date"Date" is a type and predefined property provided by Semantic MediaWiki to represent date values.
2020 +