Facebook Linked Data via the Graph API

Paper Title: 
Facebook Linked Data via the Graph API
Authors: 
Jesse Weaver, Paul Tarjan
Abstract: 
Facebook’s Graph API is an API for accessing objects and connections in Facebook’s social graph. To give some idea of the enormity of the social graph underlying Facebook, it was recently announced that Facebook has 901 million users, and the social graph consists of many types beyond just users. Until recently, the Graph API provided data to applications in only a JSON format. In 2011, an effort was undertaken to provide the same data in a semantically-enriched, RDF format containing Linked Data URIs. This was achieved by implementing a flexible and robust translation of the JSON output to a Turtle output. This paper describes the associated design decisions, the resulting Linked Data for objects in the social graph, and known issues.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised submission, now accepted, after an accept pending minor revisions. Reviews of the original submission are beneath the second round reviews.

Solicited review by Michael Hausenblas:

The authors have addressed all the issues I raised in the first round (and where this was not directly possible, such as with metrics, they provided a sensible justification). I'm happy to accept the article in its current form for publication.

Solicited review by Ivan Herman:

This is a re-submission of a previously reviewed paper. I am happy with the answers on my previous comments

Solicited review by Amit Joshi:

In this revised version, authors have included a section to discuss about its lack of linkage to external datasets. They have also mentioned why they disregarded JSON-LD. Authors could have explained more on why there's no integration with geonames/dbpedia for entities like places. It's not yet clear how to access user's pictures in rdf format.

First round reviews:

Solicited review by Michael Hausenblas:

This article is well written and the importance of the dataset at hand (Facebook Graph API) is without any doubt very high. My main issue with the work is that it is not Linked Data in the strict sense as the 4th principle (links to other datasets are missing as pointed out by the authors themselves in Section 1) is not followed. I think, however, that it is worth accepting the article if the authors include a discussion why this is currently not the case and how this can change.

In the paper,

## Core DSD
The core questions concerning the DSD are answered in Section 1 besides the license.

## Publishing and metrics
The authors made clear what and how has been done but it's unfortunate that no metrics are available as they are proprietary. Although I think I understand the reasons behind it, some estimates (with FB permission) should be included.

## Examples, modeling patterns and shortcomings
The paper contains sufficient level of detail concerning the modeling and examples and also discusses shortcomings in a convincing way.

## What is missing
In Section 4 I would have expected a discussion why existing vocabularies such as FOAF or DC have not been used and if mappings exist (or are planned) - in addition a discussion of the relation to Schema.org terms (w.r.t indexing by Search Engines) would be interesting.

The potential usefulness of the dataset is not described. What does the RDF version provide what the Facebook Graph API directly not provides? Are there applications that benefit from your dataset?

## Editorial comments

* The use of English in the paper is very good, no changes needed AFAICT.
* Section 1: "In 2011, an effort was undertaken to provide the same data ..." - by whom? why?
* Section 1: "The Linked Data represents only the un- derlying graph and does not connect to other Linked Data on the web, and as such, it is considered only four-star Linked Data." - please explain why this is the case and how this can change
* Section 1: "api: is the prefix for tag:graph.facebook. com,2011:/" - that 'tag:' prefix is used without intro - maybe insert forward reference to Section 6.2
* Section 2: I wonder why JSON-LD (http://json-ld.org/) was not used - this could be discussed

Solicited review by Ivan Herman:

My main request would be to give some examples on how this dataset is, or is planned be, used in a larger Linked Data context. The importance of this dataset is clear to me but, nevertheless, usage examples would be important.

Anothere question: what about the information stored in Web pages via the Facebook vocabularies in RDFa? I expect that this ends up in the Linked Data set at some point but it may not be clear for the general public how the flow of data works from the RDFa statements (in Web pages) to the dataset in RDF. I realize that this is only tangential to the main bulk of your work, but having some hints at that would be interesting to the audience of the journal in my view.

I also have a bunch of comments/questions about future plans. I hope you can add some words and hints in the final version of the paper in case such plans exist. If not then, well... maybe it gives you ideas for future work!

- You yourself acknowledge that, at this moment, the dataset does not link significantly to outside datasets. I was wondering whether there are plans to do that, at least on the level of the vocabularies. For example, the data on persons may be expressed either with FOAF or with schema.org's relevant terms; any plans to go in this direction?

- Any hope to instantiate this dataset, either live or via regular dumps, behind a SPARQL endpoint? It would be a great way to explore the data which is, after all, huge...

- Have you considered JSON-LD as an alternative serialization format? It may well be that for programmers who are used to JSON, this may be an easy(er) path to get to RDF and, through that, to more general Linked Data. I must admit I do not know how close the native JSON output of the Facebook compares to JSON-LD, but I would expect that via a judicious usage of the @context part of JSON-LD, it may not be very different, which may therefore be an interesting avenue to explore after all the great work you have done.

Solicited review by Amit Joshi:

This paper presents an rdf extension of Facebook's Graph API by providing graph data in RDF format, in addition to the widely used JSON format. Given the sheer size of facebook users, publishing facebook's social graph data in linked data format would prove highly useful to semantic web community. The paper is well written and easy to follow. However, it has following weaknesses:

1. The paper seems to be incomplete to a large extent since it does not provide details of how it handles different kinds of user data ex: Movies, location, events [1] . It has provided a very simple example of user instance and photo instance and does not mention any pointers to other complex object types.

2. It looks like the represenation of graph data in linked data format is still experimental. There is no link to access the rdf data nor any methodology described to get graph data in rdf.

3. It would be good to know why author's have chosen not to use foaf:Person to describe some user properties. In addition, authors could use popular datasets like dbpedia and geonames if data contains relevant information like location/country. One of the principles of linked data is to improve discovery through links to other datasets. In this paper, no such connections have been described.

[1] http://developers.facebook.com/docs/reference/api/

Tags: