Who is BusinessDesk?
BusinessDesk has been covering the New Zealand political economy and the fortunes of its listed and unlisted businesses since 2008.
BusinessDesk have thousands of individual subscribers and more than 150 corporate and government subscribers.
For more information https://nextdeveloper.com/casestudy/businessdesk
Public Sector Case Study
They wanted to connect people, companies and even ministries through thousands of news links in a project that they made in partnership with the state, with their knowledge accumulated since the day it was opened, and we started to look at how we can take action regarding this.
The main purpose here was to see where and with whom a person works, and to see who he or she has contact with other than workplace relations through news.
Problem
The main problem here was to clarify how the relationships would be, where and how this data would be kept, and how this information would be created from thousands of news.
We started to think about what we can do with thousands of news, hundreds of people and so many companies.
At first glance, it seemed that we could easily solve this with the MySql database we used in the project. We match company connections from people and other people from there. Okay here, what about the news? Here, voices arose that we should proceed in a separate structure, since an ordinary search operation would not save us, even if it did, it would create performance problems in relationships after the 2nd level.
Solution
First of all, it was necessary to solve the part of how the relations of these people and companies would be. Here, our first goal was to find the business relationships people have with each other as well as with the companies they work for.
With the information we had, we were able to reach the places where people actively work and have worked before. This was the first and easiest connection for us. We pocketed it as a level 1 connection that we can reach directly.
After creating the information of all users with the same logic, we have also established the connection of all companies with the people working there.
After these 1st level relationships, it’s time to go into a little more detail and find colleagues through the company the person works for.
Since MySql will not be enough for us here, we decided to continue with the Neo4j - graph database. After making sure that Neo4j was created for this and similar works, as well as the convenience it provides during the use and display of the generated data, we started to take a step here.
First of all, we created all the people with “People”, companies with “Entity” and news with “Story” nodes here.
All of these nodes required information that we could use to relate to each other.
For the “People” node, we have assigned the companies that it has worked with actively and in the past in an array called “EntityIds”. With this relationship, we have actually completed our 1st level connections.
After creating our first connections and leaving them in Neo4j, it’s time to get a little more complicated, news. Here, we searched all the names of people and companies in the news as a whole text and found which person and company name was mentioned in which news.
We created 2 arrays named “PeopleIds” and “EntityIds” in the “Story” node and assigned the information we found in connection with the news.
If you want to learn more technical parts, you can continue here.
Result
When we returned to Neo4j again, we had nodes for people, companies and news and our related relations ready.
Thanks to the “EntityIds” in the first “People” node, we have a direct connection to the following relations.
- People and all the companies they work with
- Companies and all people working here
Up until now, I’ve always talked about level 1 relationships like this. We access our first level 2 connection through the fellowship.
So; First of all, we reach the company where a person works, and then the employees in this company and reach our first 2nd level connection.
With the same logic, we can easily access our 3rd and 4th level connections through Neo4j, through other places where the colleague works.
For news, again with the logic of business friendship, using the information in the “PeopleIds” and “EntityIds” series, we first established the 2nd level connections of all the people and companies here.
With a logic similar to that of a business friendship, we were able to establish an in-depth relationship with the people or companies with which we established this relationship, through the 2nd level connections we previously established.
Some things to consider
-
Here, after the first relationship, we have made a job that will trigger this system in case of update or deletion for each new record, in order for this system to continue its life properly in case of a change of new news, person or company, so that only the changed data is processed.
-
We checked the accuracy of the data in order not to experience performance problems or to fall into an infinite loop in Neo4j. And we wrote the most appropriate and optimized queries for relations in Neo4j.
-
Within our project and Neo4j, we have kept the information we will display on the website as simple as possible. Afterwards, we reached the result with a correct coding, by following the necessary software principles so that the viewer can reach the result comfortably without drowning in the codes.
How we use Neo4j, some more technical information
There are four different models in our system which are people, entities, ministers and stories. All of these models can be added and updated from the admin panel. There are also special types of relations (edges) between these models that are used for gathering connections between model nodes.
First, a story can have relations with every other type of model, except other stories. If a story’s content (body text) includes the name of either a people, entity or minister, the story connects to this model. This specific type of connection is called REL_STORY.
As we can see in the below image, the story with an id number of 24553 has 3 connections. We represent entities with red nodes and people with blue nodes. So, we can say that the person with an id 25 is connected to two other entities through that story.
Entities are the companies where people are currently working or worked in the past. For our second type of relation, this work relationship is called REL_EMPLOYMENT. This type of relationship can only exist between people and entities.
Neo4j Service
Our main web application uses a MySQL database and is located in a different server from our Neo4j server. So, we implemented another project with Laravel in our Neo4j server. This new application accepts requests from our main web application and directs them to Neo4j. After it gets back the results from Neo4j, it sends this result to our main web application.
The main application sends an HTTP request with the node type and id and connected nodes id numbers and their type. The Neo4j Laravel app refreshes the node’s relations using the received data from the main application. Neo4j only knows the id of the data which are originally stored in the MySQL server.
Here is our example HTTP request body which connects people with ids 1, 2 and 3 to the story with id 1. Also relation’s type should be specified.
{
“nodeClass”: “Story”,
“relatedNodeClass”: “People”,
“nodeID”: 1,
“relatedNodeIDs”: [1,2,3],
“relationType”: “REL_STORY”
}
Observer Functions & Jobs
We synchronize the Neo4j data with observer classes and jobs. When something changes in these models, we should update the Neo4j server. We need to use jobs for some operations because MySQL search takes too much time for a web page load. Observers are listed with their tasks below;
Story observer: When a new story is added or the story’s body is changed, create a MySQL search job for searching person, entity and minister names in that story’s body. After the search result is obtained, send it to Neo4j.
People observer: When a new person is added or the person’s name is changed, create a MySQL search job for searching story bodies for that person’s name. After the search result is obtained, send it to Neo4j. If the person’s employment history changes, get all employment history from MySQL and send it to Neo4j. Additionally, the person’s organization is also added to employment history.
Entity observer: When a new entity is added or the entity’s name is changed, create a MySQL search job for searching story bodies for that person’s name. After the search result is obtained, send it to Neo4j.
Minister observer: When a new entity is added or the entity’s name is changed, create a MySQL search job for searching story bodies for that person’s name. After the search result is obtained, send it to Neo4j.
Cypher Queries
After all the relation data is stored in Neo4j we can get the connections from it using match queries. These match queries work like select queries in SQL. These functions are listed and explained below;
getConnectedThroughStories()
This function takes person id, entity id or minister id as an input and returns all other people, entities and ministers that are connected to that input model via stories. Purpose of this connection is to find individuals mentioned in the same stories. For example, the below query gets all entities which are connected to the people with id 25 through stories. We can do any combination with this query. For instance, we can also select all connected ministers to a given entity etc.
MATCH (n:People)-[r1:REL_STORY]-(s:Story)-[r2:REL_STORY]-(m:Entity)
WHERE n.source_id = 25
RETURN DISTINCT m
getConnectedThroughEmployment()
This function takes person id as an input and returns all other people that are connected to that input person via entities. Purpose of this connection is to find people with shared work history. For example, the below query gets all other people which are connected to the given people with employment history.
MATCH (n:People)-[r1:REL_EMPLOYMENT]-(s:Entity)-[r2:REL_EMPLOYMENT]-(m:People)
WHERE n.source_id = 25
RETURN DISTINCT m
getConnectedEmployers()
This function takes person id as an input and returns all other directly connected entities. Purpose of this connection is to find the person’s past related employers. For example, the below query gets all entities where the given person is working now and has worked before.
MATCH (n:People)-[r:REL_EMPLOYMENT]-(m:Entity)
WHERE n.source_id = 25
RETURN DISTINCT m
getConnectedEmployees()
This function takes entity id as an input and returns all other directly connected people. Purpose of this connection is to find the entity’s past related employees (people). For example, the below query gets all people that the given entity is employing now and has employed before.
MATCH (n:Entity)-[r:REL_EMPLOYMENT]-(m:People)
WHERE n.source_id = 12
RETURN DISTINCT m