Cannot connect to cluster using k8s Ingress

I'm trying to setup a cluster for Neo4J version 4.0.1. I started with the official stable helm chart (more specifically I used an open PR https://github.com/helm/charts/pull/20942 because I'm on the latest version of K8s).

The first issue I experienced (which is not present in version 3.x) is related to Unable to run neo4j v4 casual cluster with docker-compose or docker swarm.

I managed to fix that by replacing all occurrences of $(hostname -f) with $(hostname -I | awk '{print $1}').

Once that was fixed, I then tried to connect to the bolt server using the neo4j:// scheme. However I was getting:

{ Neo4jError: Could not perform discovery. No routing servers available. Known routing table: RoutingTable[database=default database, expirationTime=0, currentTime=1577183669630, routers=, readers=, writers=]

I tried the suggestion from this post Neo4jError: Could not perform discovery. No routing servers available but it didn't fix the issue.

To give more details about the setup, I'm using an Ingress LB that forwards requests to a new service I created:

apiVersion: v1
kind: Service
metadata:
  name: neo4j-external-access
  labels:
    app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
    helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    app.kubernetes.io/name: {{ template "neo4j.name" . }}
    app.kubernetes.io/component: core
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 7474
      targetPort: 7474
    - name: bolt
      port: 7687
      targetPort: 7687
  selector:
    app.kubernetes.io/name: {{ template "neo4j.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
    app.kubernetes.io/component: core

and the Ingress is as follows

  - host: browser.domain.app
    http:
      paths:
      - backend:
          serviceName: neo4j-external-access
          servicePort: 7474tcp
  - host: bolt.domain.app
    http:
      paths:
      - backend:
          serviceName: neo4j-external-access
          servicePort: 7687tcp

Finally, I'm using the following JS code (I use neo4j-driver v4.0.1) to connect to the bolt server:

    const driver = neo4j.driver(
      'neo4j://bolt.domain.app:443',
      neo4j.auth.basic('neo4j', 'password'),
      config
    );
 

I tested the connection using the browser explorer and I still got the same error. The only way I could connect was using bolt:// instead of neo4j://.

I searched for this issue, I've read this article https://medium.com/neo4j/neo4j-considerations-in-orchestration-environments-584db747dca5, as well, but I can't find a solution.

Update

I had some progress which I want to share with you.

I ended up setting a unique bolt_advertised_address for each of the pods. I did that by updating the statefulSet manifest in the neo4j helm chart like this:

SET_INDEX=${HOSTNAME##*-}
export NEO4J_dbms_connector_bolt_advertised__address="bolt-$SET_INDEX.domain.app:443"

I also updated the ingress settings so it includes the above addresses.

I finally managed to the browser explorer using neo4j://bolt-0.domain.app:443.

However, I still get the same error when I'm trying to connect via the neo4j-driver.

Cheers,
Pavlos

The "considerations in orchestration environments" article talks a lot about what I think you're running into here.

When you use a client to connect to neo4j 4 using the neo4j:// scheme, then it will try to "cluster route" queries by default. The way this actually works is that the client gets a "routing table" from the server. The server responds with what cluster members are there, and what their roles in the cluster are (leader, follower, etc). You can check this manually by using browser and calling CALL dbms.cluster.overview();. (This isn't exactly what the driver does but for our purposes here it's very close and shows you similar information)

A key thing here is those bolt advertised addresses and what comes back in the response to that cluster overview call. The driver figures out who exists in the cluster, and then routes queries. When you're inside of kubernetes, it's very easy to advertise an address that isn't externally routable. This is generally the cause of the errors you're seeing.

Without seeing the specifics of your neo4j pod configuration, it's hard to tell. But that's where I'd look. Use browser, CALL dbms.cluster.overview() and pay particular attention to the addresses it reports back for the cluster members. If those addresses aren't routable to your external to kubernetes client, then the client (using the neo4j://server connection scheme) has no chance to do the right thing.

HTH

Hi David,

Thank you for the prompt response.

There are a few things mentioned that are not very obvious for someone who doesn't exactly know the internal working of the cluster setup.

I still can't find answers to very simple questions:

  1. If I don't use NEO4J_dbms_connector_bolt_advertised__address then I can't connect though the Neo4j browser using the neo4j:// scheme

  2. If I use a unique NEO4J_dbms_connector_bolt_advertised__address for each of the three pods I deploy within the StatefulSet then I can connect though the Neo4j browser using the neo4j:// scheme but I can't connect via the JS driver.

  3. The way this actually works is that the client gets a "routing table" from the server.

This is not quire clear to me. What do you mean by server? There are three nodes (1 leader 2 followers) and I use a valid public domain name to access the service though an k8s ingress setup. How is sending that routing table? The leader or any of the node within the cluster.

  1. When you're inside of kubernetes, it's very easy to advertise an address that isn't externally routable

I'm not really inside kubernetes. As I showed above I have created a new service (neo4j-external-access) and there is an Ingress that allows external access to the pods of the statefulSet

  1. Without seeing the specifics of your neo4j pod configuration, it's hard to tell

As I said, i'm using the official helm chart and more specifically a PR that hasn't merged yet https://github.com/helm/charts/pull/20942.

  1. CALL dbms.cluster.overview() returns
╒═════╤═════════════╤══════════════════════════════════════════════════════════════════════╕
│"ttl"│"server.role"│"server.addresses"                                                    │
╞═════╪═════════════╪══════════════════════════════════════════════════════════════════════╡
│300  │"WRITE"      │["bolt-2.domain.app:443"]                         │
├─────┼─────────────┼──────────────────────────────────────────────────────────────────────┤
│300  │"READ"       │["bolt-0.domain.app:443","bolt-1.domain.app:443r│
│     │             ]                                                   │
├─────┼─────────────┼──────────────────────────────────────────────────────────────────────┤
│300  │"ROUTE"      │["bolt-0.domain.app:443","bolt-1.domain.app:443│
│     │             │","bolt-2.domain.app:443"]       │
└─────┴─────────────┴──────────────────────────────────────────────────────────────────────┘

If you ignore the exact domain names (I'd, hiding the ones that I'm actually using) this is the routing table I get back.

Let's take that routing table as an example. Let's take as an example client a python program running outside of Kubernetes. You give it neo4j://whatever as a conncetion point. It is able to make a connection through your ingress. It gets this routing table. It then attempts to set up connections to the addresses "bolt-0.domain.app:443","bolt-1.domain.app:443","bolt-2.domain.app:443".

Notice that for the python program external to kubernetes, those addresses aren't routable, as those DNS names are valid inside kubernetes. Because of this, your python program using the neo4j:// scheme will probably fail to connect because while it contacted the initial member through the kubernetes ingress, it can't contact any other members, because they don't have valid routable addresses (from the perspective of outside kubernetes).

Hi David,

Thank you again for taking the time a response to my questions.

I will try to visually show what my setup looks like.

As you can see there are A records with the external DNS for bolt-0.domain.app:443,bolt-1.domain.app:443 and bolt-2.domain.app:443 that point to the Ingress. So from a network point of view, those addresses are routable to the internal Pods via the Ingress and the neo4-external-access service.

Basically, those addresses aren't valid only inside the kubernetes cluster; they are also visible to the outside world. I think a good indication of that is that I manage to connect via the :7474/browser which successfully creates a wss connection with any of those three URLs.

Ok I finally managed to make it work. The setup was correct; the only issue was with the http port 443. I had to export TCP port 7687 instead to allow incoming connections.

FYI, I followed this instructions https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/

I also updated the advertised bolt address:

SET_INDEX=${HOSTNAME##*-}
export NEO4J_dbms_connector_bolt_advertised__address="bolt-$SET_INDEX.app:7687"

Now I can successfully connect to neo4j://bolt-0.app:7687 via the JS driver.

1 Like