In this post, I’ll walk through how you can take a backup of an existing database and use it to seed an instance of Neo4j inside a Docker container. This could be useful if you are looking to fire up a development server using real data. I’ll show you how how to launch an instance of Neo4j using
docker-compose and then extend the official Docker image by creating a custom
Neo4j on Docker
You can find Neo4j images going back to 3.4 on Docker hub all named as
x.y.z for community and with
-enterprise appended for enterprise edition. You can get up and running quickly by running the
docker run command.
In order to run Enterprise Edition, you need to accept the Neo4j Licensing Agreement. You do this with Docker by setting the
NEO4J_ACCEPT_LICENSE_AGREEMENT variable to
docker run --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes neo4j:4.0.0-enterprise
Taking a Backup
There are two types of exports -
dump command creates an archive that can be easily shared and is great for smaller databases. For larger databases, the
export command allows you to do an incremental backup. If you run a backup on a directory that already has a backup in there, it will take the difference and append it to the store files rather than starting from transaction id 0. This is great for larger databases.
The backup is enabled by default, but by default it will only listen on the backup port on requests from localhost. Because we’ll be taking a backup from a local machine, we’ll need to enable remote backups by setting
# Enable online backups to be taken from this database. dbms.backup.enabled=true # By default the backup service will only listen on localhost. # To enable remote backups you will have to bind to an external # network interface (e.g. 0.0.0.0 for all interfaces). # The protocol running varies depending on deployment. In a Causal Clustering environment this is the # same protocol that runs on causal_clustering.transaction_listen_address. dbms.backup.listen_address=0.0.0.0:6362
To run the backup you can run the following command:
bin/neo4j-admin backup \ --from=neo4jurl:7687 \ # hostname or ip and port for neo4j server --backup-dir=/path/to/backups \ # directory to store the backup in --database=neo4j # The name of the database to backup
A consistency check runs as part of the backup to make sure that the backup files are OK, but this can take a while on a large database. You can disable this by adding
--check-consistency=false and check the consitency at a later time.
Automating the Backup using Docker
One of the nice things about Docker is that you can build or extend
Dockerfiles to create an image. These can also be published to the Docker Hub but I won’t cover that here. The
FROM keyword allows you to choose an image to build on top of, in this case we want the latest version of Neo4j Enterprise.
The neo4j images all automatically start up the neo4j instance. In this case, we want to run the backup on the production server and restore it before the neo4j server starts. We can do this by replacing the the
ENTRYPOINT with one of our own. There’s a lot of complicated stuff going on in in the docker-entrypoint.sh file that I don’t really want to be replicating and maintaining, so instead we can just create a new shell file which performs the backup and restore before calling the original
#!/bin/bash echo "Running Backup & Restore" neo4j-admin backup --from=$PRODUCTION --backup-dir=/backup neo4j-admin restore --from=/backup/neo4j --database=neo4j --force
The script runs the
neo4j-admin backup command and places the backup in the
/backup directory before
restoreing it into the default
neo4j database. The introduction of the
$PRODUCTION environment variable to the call means that the address of the neo4j server can be set as an
--env flag when the container is created. The
--force command will overwrite any files if they already exist, perfect for if we’re mounting a volume for the data.
The file ownership caused me a few issues when developing this script, the neo4j process is run by a user called
neo4j whereas this entrypoint script is ran by root. Originally, this caused a
Neo.TransientError.Database.DatabaseUnavailable error complaining that
Database 'neo4j' is unavailable. This was because the neo4j user couldn’t write to the directory.
chowning the /data directory to
neo4j:neo4j fixes this issue.
chown -R neo4j:neo4j /data
After that, the original
docker-entrypoint.sh script can be run to work it’s magic and bring the database up.
Dockerfile, a few commands are needed to clean things up. Firstly, we’ll need to accept the license agreement.
ENV NEO4J_ACCEPT_LICENSE_AGREEMENT yes
Next, setting the
dbms.directories.data directory to a folder in the root will make it easier to mount a volume.
ENV NEO4J_dbms_directories_data /data
my-entrypoint.sh needs to be copied to the docker container. By default the file will not have execute permissions, so the
RUN command will allow us to run
chmod to add execution permission on the file.
WORKDIR / COPY my-entrypoint.sh /my-entrypoint.sh RUN chmod +x /my-entrypoint.sh
Finally, we can overwrite the
ENTRYPOINT to run
my-entrypoint.sh (and subsequently the original
docker-entrypoint.sh) before running the
neo4j command to start neo4j.
ENTRYPOINT ["/sbin/tini", "-g", "--", "/my-entrypoint.sh"] CMD ["neo4j"]
Building the image
docker build command creates an image that can be used when creating containers.
To make life easier, I have tagged the new image as
dev using the
-t dev flag, otherwise it would generate a random hash and the whole thing couldn’t be automated.
Creating a Container
Containers with the newly created
dev image can be created using the
docker run command. I have mapped the HTTP and Bolt ports using
-p so I can access the Neo4j Browser and query the data via bolt. As mentioned before, running a backup on a directory with an existing backup will trigger an incremental backup so I will mount the backup directory as a volume on the docker container. The same goes for the data directory. The local path to the volumes need to be absolute so I have created a
$HERE environment variable to make things a bit easier.
docker run --name=dev \ -p 17474:7474 \ # Map HTTP port from container to 17474 -p 17687:7687 \ # Map Bolt port from container to 17687 --env="PRODUCTION=prod.databases.adamcowley.co.uk:6463" # Env var for server --volume="$HERE/backup:/backup" \ # Mount backup directory volume to /backup --volume="$HERE/data:/data" \ # Mount data directory volume to /data dev # Use the newly built dev image
Being fairly inexperienced with Docker, this took me a while to figure out. But once I realised that I can just extend an existing image, my life became a lot easier. This process works well for a single instance, but could also be used to automate the seeding and deployment of Read Replicas. Downloading a copy of a previous backup and mounting it as a volume will speed up the startup process on larger databases.
I’ve put the code up on Github - feel free to pull, clone or submit a PR.
This is a companion discussion topic for the original entry at https://adamcowley.co.uk/neo4j/neo4j-docker-seed-backup/