Why Semantic Databases are so Important – Discovering Shared Meaning

An excerpt from my upcoming article that I wanted to share separately from the full article, as it is an excellent example of one reason semantic databases are so important — the ability to discover shared meaning.

Being able to discover implicit associations is one of the unique features of a semantic database.  Given three semantic types:

  1. Person
  2. Date
  3. MonthLookup

implemented with the following semantic schema:

threefoldSchema.png
Our “discovery” method determines the following associations (left-to-right is bottom-most to top-most):

month <- monthLookup
month <- date
  
name <- monthName <- monthLookup
name <- firstName <- personName <- person
  
name <- monthName <- monthLookup
name <- lastName <- personName <- person
  
name <- monthAbbr <- monthLookup
name <- firstName <- personName <- person
  
name <- monthAbbr <- monthLookup
name <- lastName <- personName <- person

From this, we could construct a query where we can say “give me all the dates whose month’s name is also the person’s first name”.  This would follow the association chain like this:

associations1

 

 

 

If we populate the month lookup semantic records with the obvious 12 months of the year, and the person an date semantic records with (flattened view here):

{month: 8, day: 19, year: 1962}
{month: 4, day: 1, year: 2016}

and (again, flattened view here):

{firstName: 'Marc', lastName: 'Clifton'}
{firstName: 'April', lastName: 'Jones'}

We can then write a MongoDB query, based on the discovered shared meaning:

db.firstName.aggregate(
// firstName -> name
{ $lookup: {from: ‘name’, localField: ‘nameId’, foreignField: ‘_id’, as: ‘fname’} },
{ $unwind: ‘$fname’ },
// name -> monthName -> monthLookup
{ $lookup: {from: ‘monthName’, localField: ‘fname._id’, foreignField: ‘nameId’, as: ‘monthName’} },
{ $unwind: ‘$monthName’ },
{ $lookup: {from: ‘monthLookup’, localField: ‘monthName._id’, foreignField: ‘monthNameId’, as: ‘monthLookup’} },
{ $unwind: ‘$monthLookup’},
// monthLookup -> month
{ $lookup: {from: ‘month’, localField: ‘monthLookup.monthId’, foreignField: ‘_id’, as: ‘month’} },
{ $unwind: ‘$month’},
// month -> date
{ $lookup: {from: ‘date’, localField: ‘month._id’, foreignField: ‘monthId’, as: ‘date’} },
{ $unwind: ‘$date’},
// date.day -> day.value
{ $lookup: {from: ‘day’, localField: ‘date.dayId’, foreignField: ‘_id’, as: ‘day’} },
{ $unwind: ‘$day’},
// date.year -> year.value
{ $lookup: {from: ‘year’, localField: ‘date.yearId’, foreignField: ‘_id’, as: ‘year’} },
{ $unwind: ‘$year’},
{ $project: {‘monthName’: ‘$fname.name’, ‘month’: ‘$month.value’, ‘day’: ‘$day.value’, ‘year’: ‘$year.value’, ‘_id’:0} } )

Giving us the one matching record.

fnamemonthdate

 

 

 

 

What we’ve achieved here is quite interesting!  Because our database is semantic, the system knows that things like “month” and “name” have a shared meaning, so we can ask the schema “what entities have shared meaning” and we can weave through the hierarchies of the semantic types to combine types into new and interesting queries.  This kind of query could of course be expressed in SQL (and perhaps more simply), but we would be comparing non-semantic field values that the programmer decided had shared meaning, rather than the system discovering the shared meaning.  By letting the system discover the shared meaning, the user, not the programmer, can make new and interesting associations.

Advertisements

Using MongoDB to Implement a Semantic Database – Part I

mongodbart

 

 

Newly published article on Code Project.

Semantic Database Technology (from InformationWeek):

Semantic technology has created a disruptive opportunity for businesses to obtain more value from their data. The concepts surrounding the semantic Web, such as linked data cloud and data mashups, are powered by a set of emerging standards and products that, for now, are mainly used for consumer services. However, these technologies are equally compelling as part of an enterprise data platform behind the firewall.
At a high level, there are five main benefits of semantic technology:
    > It works in tandem with your existing database investments;
    > It aligns with Web technologies;
    > It speeds the integration of multiple databases;
    > It’s based on data structures that are flexible by design; and
    > It can help enterprises tackle big data challenges.

Excerpts from my article:

Fundamentally, a semantic database captures relationships.  There are two primary kinds of relationships:

  1. Static, implicit relationships that define the structure (give meaning) to a semantic term (a symbol).  These are typically expressed with the same terms used in object oriented programming “has a” and “is a kind of.”
  2. Static or dynamic explicit relationships, where the relationship itself has a meaning expressed in a semantic term and where dynamic relationships can change over time.  In programming, these relationships are usually expressed implicitly in the code, for example, a dictionary or other key-value pair collections.  Dynamic relationships often have a time frame — a beginning and an ending.

The advantage of a NoSQL database is that the schema itself is dynamic:

  1. The structure of implicit symbols change (think of how names and addresses vary among cultures.)
  2. New symbols can be easily added (simply add a new collection.)
  3. New relationships between symbols can be easily added (simply add a collection with two fields associating the ID’s of two collections.)

Read the whole article here!

MongoDB Grows Up – the $lookup aggregator

mongodbartSo far, I’ve been avoiding anything having to do with NoSQL because of the inability to do, in the classical RDBMS world, table joins.  The idea of having to pull into memory (well, my application’s memory) all the results of one table and then join them (with more code) to another table was appalling to me, particularly since I want my queries to be runtime generated by metadata schema.

However, as of the release of MongoDB 3.2, that has changed!  I can now do multiple table (collection) joins, which in my opinion, is critical for working with semantic data.  As Semag points out:

“Data is organized based on binary models of objects, usually in groups of three parts: two objects and their relationship.”

So now, in MongoDB, I can do something very simple (and very non-semantic in this example):

db.createCollection("Person")
db.createCollection("Phone")
db.createCollection("PersonPhone")

db.Person.insert({ID: 1, LastName: "Clifton", FirstName: "Marc"})
db.Person.insert({ID: 2, LastName: "Wagers", FirstName: "Kelli"})

db.Phone.insert({ID: 1, Number: "518-555-1212"})
db.Phone.insert({ID: 2, Number: "518-123-4567"})

db.PersonPhone.insert({ID: 1, PersonID: 1, PhoneID: 1})
db.PersonPhone.insert({ID: 2, PersonID: 2, PhoneID: 1})
db.PersonPhone.insert({ID: 3, PersonID: 2, PhoneID: 2})

Notice I have to tables, Person and Phone (please ignore my case style, I come from a different world, some may say planet).  I can now query the data using the $lookup aggregate function (plus a few other pieces):

db.PersonPhone.aggregate([
{ $lookup: { from: "Person", localField: "PersonID", foreignField: "ID", as: "PersonName" } }, 
{ $lookup: { from: "Phone", localField: "PhoneID", foreignField: "ID", as: "PersonPhone" } }, 
{ $match: {PersonID: 2} }, 
{$project: {"PersonName.LastName":1, "PersonName.FirstName":1, "PersonPhone.Number": 1, _id:0}} ])

and I get a lovely resulting dataset:

{
  "PersonName": [
    {
      "LastName": "Wagers",
      "FirstName": "Kelli"
    }
  ],
  "PersonPhone": [
    {
      "Number": "518-555-1212"
    }
  ]
}{
  "PersonName": [
    {
      "LastName": "Wagers",
      "FirstName": "Kelli"
    }
  ],
  "PersonPhone": [
    {
      "Number": "518-123-4567"
    }
  ]
}

So now I can use finally MongoDB to satisfy the foundational tenet of a semantic database: a relationship between two objects.

And the nice thing about using a NoSQL database is that schema is not fixed in concrete, as it is with a SQL database, which would normally require additional layers of manipulation to deal with the second foundational tenet of a semantic database: that your concept of, say, a person’s name might be semantically different (but still compatible) with mine — especially when we consider culture.