Ontologies and Languages for Representing Mathematical Knowledge on the Semantic Web

Paper Title: 
Ontologies and Languages for Representing Mathematical Knowledge on the Semantic Web
Authors: 
Christoph Lange
Abstract: 
Mathematics is a ubiquitous foundation of science, technology, and engineering. Specific areas, such as numeric and symbolic computation or logics, enjoy considerable software support. Working mathematicians have recently started to adopt Web 2.0 environment, such as blogs and wikis, but these systems lack machine support for knowledge organization and reuse, and they are disconnected from tools such as computer algebra systems or interactive proof assistants.We argue that such scenarios will benefit from Semantic Web technology. Conversely, mathematics is still underrepresented on the Web of [Linked] Data. There are mathematics-related Linked Data, for example statistical government data or scientific publication databases, but their mathematical semantics has not yet been modeled. We argue that the services for the Web of Data will benefit from a deeper representation of mathematical knowledge. Mathematical knowledge comprises logical and functional structures – formulæ, statements, and theories –, a mixture of rigorous natural language and symbolic notation in documents, application-specific metadata, and discussions about conceptualizations, formalizations, proofs, and (counter-)examples. Our review of approaches to representing these structures covers ontologies for mathematical problems, proofs, interlinked scientific publications, scientific discourse, as well as mathematical metadata vocabularies and domain knowledge from pure and applied mathematics. Many fields of mathematics have not yet been implemented as proper Semantic Web ontologies; however, we show that MathML and OpenMath, the standard XML-based exchange languages for mathematical knowledge, can be fully integrated with RDF representations in order to contribute existing mathematical knowledge to theWeb of Data. We conclude with a roadmap for getting the mathematical Web of Data started: what datasets to publish, how to interlink them, and how to take advantage of these new connections.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Aldo Gangemi
Decision/Status: 
Accept
Reviews: 

This is the second revision of a manuscript which was accepted (twice) with minor revisions, and has now been accepted for publication. The reviews of the second round are below, followed by the reviews of the previous (and first) round of reviews.

Solicited review by Claudio Sacerdoti:

This is the review for the second major revision of the paper.

In the previous reviews, all reviewers have raised major issues on the paper.
The author has addressed some of them within the paper and he has rebutted
the remaining. In particular, he has motivated why he does not intend to
consider Semantic Web reasoning in the paper. Having worked myself on the topic
of indexing and search of mathematical formulae in mathematical libraries,
I am sympathetic with the motivations given by the author.

My major request was the following:
"To summarise, I invite the author to provide a more critical analysis of the
past failure and future promises of integration of MKM with the Semantic Web.
The analysis needs to be at the level of actual, concrete services to be
provided, not assuming that describing everything in RDF will automatically
trigger new revolutionary services."

The author has clearly made an effort to describe the future promises.
None of them seems sufficiently convincing to me to envision a new
successful adoption of Semantic Web techniques in MKM. Moreover, most
promises suggest contributions of MKM techniques to the Semantic Web, and
not the other way around. Coming from the MKM community, I am not in the
position to judge the degree of realism and interest of the proposed MKM
contributions to the Semantic Web.

The author also claimed of not having found any available material that
critically analyses the past failure and, indeed, I am also not aware about
any gray literature on the topic. Therefore I understand that recovering the
missing information by means of interviews is a time consuming activity that
cannot be forced on the author with the only aim of improving the current
submission.

The remaining changes have sufficiently raised the quality of the submission,
and I now think that the paper is in shape to be published with minor changes.
Nevertheless, I am not changing significantly the overall considerations on the
paper that I had expressed in my previous review and that were summarised by
the following sentence:
"To conclude, I think that the paper is valuable since it provides: 1)
a retrospective on part of the history of MKM; 2) valuable pointers to
techniques and tools to be known to merge MKM with the Semantic Web; 3) a not
very deep but wide categorisation of languages for the representation of
mathematical knowledge in a broad sense. Moreover, it serves as a position paper
to renew the efforts on the integration of MKM with the Semantic Web, with the
hope that this second try will not be another false start. Even if I am not
convinced yet that this will be the case, I also believe that the Semantic Web
technology is now mature enough to make it worth to finally settle the question
about its effective exploitability for mathematical knowledge."

The following is a list of minor pending corrections that do not require a
further round of reviews:

- abstract: I still believe that "functional and logical structures" should
be avoided in the abstract or made more explicit. The terminology does not
come from the author, but it is not standard enough nor self explaining.
A simple explanation is that the functional structure gives the structure
of mathematical formulae, while the logical one ... (complete).
- page 11: "we require the ontologies ... to _faithfully_ capture the logical
and functional structures". The term faithfully seems to strong to me when
applied to the functional structure.
- requirement Lrightarrow: there is no "SHOULD/MUST" keyword in this
requirement.
- page 19: "h:MainHypothesis ... (depth 3)" the depth here is 0. The depth of
a MainHypothesis is computed inside the hypothesis itself. To be precise,
there is also a h:InConclusion which is the emptyset.
- page 19: "idiosyncratic mechanism for identifying theory items" ??
As far as I know, URIs where employed. Are you referring to something deeper?
- table 1: now that there is a legend, I can disagree with some of the
classifications.
MathML3/F.C: why ++? Is it because of parallel markup? The
only simple co-existence level is between content and presentation, but
this is not really a co-existence of different level of formalities (that,
in my opinion, are all at the content level). I suggest to lower this at
least to +.
HELM/Notation, MoWGLI/Notation: I suggest to raise them respectively to + and
++. Indeed, in HELM/MoWGLI we had XML markup languages to describe
notations. In MoWGLI, we had a chain of three XML languages, the first one
being at a very high, human friendly level and capable only of simple
things (e.g. infix binary operator, precedence level=x; binder, precedence
level=y; etc.); the intermediate one was a complex pattern language, that
hosted XSLT inside, being used for complex notations that required general
recursion; the third language being made of an XSLT library. In HELM we only
had a simpler XSLT library. Meta-XSLT stylesheets compiled each language in
the next one. The notational files were part of the "library", but they were
not linked with the objects they referred to, not they were put (only used)
inside theories.
- legend for table 1: W occurs twice with different meanings

Solicited review by Aldo Gangemi:

The article has been improved in the revised version, substantially addressing most of the reviewers' requests. An extensive response to reviewers has also been provided, which proves the high level of commitment of the author.
I agree to accept the paper with minor revisions; my comments are in line with those of the other reviewers:
- concerning the reuse of current MKM data as linked data, a point should be made about the difficulties porting those data, and on the practical availability of the MKM community for that
- the section 4.3 about ontologies should contain some kind of wrap-up: as it is, it's good as a review, but it should also highlight the pros and cons, either in text or in a table similarly to the languages table
- the response document contains discussions that, when not already incorporated in the new version, might find room in footnotes: with most critical points, this is very well received by readers of the overview.

Solicited review by Alexandre Passant:

Most of my comments have been addressed by the author - as well as further notable improvements that make the paper imo ready to publish.

I am still not completely convinced by the section on rhetorical structures but I see the point of the author. Yet, I think that what was mentioned in the reviewer comment (the idea of putting mathematics into context using these vocabularies) should be enhanced to better explain why this section is useful. An example would be nice, ideally combining rhetorics with mathematical formula - showcasing the usefulness of the combination as a graph model (i.e. on the Semantic Web).

Regarding the author's response on VoiD and SPARQL Service Description - I still think that makes sense to include them from a Linked Data perspective (that the authors mention in the intro as a possible use-case for maths on the SemWeb). If you have a way to describe (even at a high-level) stats (i.e. numbers) about your data, then you may be able to use other models to do advanced things about it. Hence, I think they should be mentioned to this regards.

Regarding "I did not intend to make this sentence suggest that RDF is an XML-based language and have therefore revised it. " I guess that was not the author intention, but please avoid any sentence that would make think that RDF = XML. This is probably just a wording problem

First round reviews:

Solicited review by Claudio Sacerdoti Coen:

The topic of the paper is the integration of mathematical content in the
Semantic Web or, more appropriately, in the Web of (Linked) Data. The author
comes from the Mathematical Knowledge Management (MKM) community, now ten years
old. Immediately at the beginning of MKM several authors have proposed or tried
integration with the Semantic Web. In practice, the most serious efforts were
all abandoned about five years ago, for several reasons. Quite a number of
papers are left nowadays in the literature as a reminiscence of that period.
Indeed, most of the paper can be understood as a review of the existing
literature. As far as I know, this is the most comprehensive review so far and
it seems to me to have a significant coverage. The coverage is balanced by
a generalised lack of deepness that, however, would yield an unreasonably
long manuscript. Hence sections 1-3 are better understood as an extended
bibliography enriched with a too quick, yet somehow useful analysis of the
languages proposed in the literature. The analysis is summarised in tables
1 and 2 that categorise all languages according to a list of requirements
provided in the paper. The tables are supposed to match the text, but it
was difficult for me to do the matching. The use of many symbols (+, ++, -, o)
without any legenda is also a significant problem.

The major problem of the paper, however, is not the level of depth, but the
lack of convincing support for its main thesis. The thesis is that the times
are now mature for trying again an integration between MKM, the Semantic Web
and the Web 2.0. To support the thesis, the author proceeds in this way.
First of all he blames the immature state in 2000-2005 of Semantic Web tools
for the failure of the original integration attempts. Then he notices how
Web 2.0 technologies are more and more embraced by working scientists and, to
a lesser degree, by working mathematicians. The fact that this embracing of
the Web 2.0 can be propedeutic to the embracing of Semantic Web technologies
is implicitly hinted without any evidence. Finally, he lists several benefits
to the Semantic Web obtained by embracing MKM technologies.

Personally, I think that the initial immaturity of the Semantic Web technologies
was an important ingredient, but maybe not the most important one, to the
failure of the program. In order to convince working scientists and working
mathematicians to make a considerable day-by-day effort for embracing the
Semantic Web, you need to provide innovative services that really make a
difference at the end of the working day. This is clearly the case for the
working mathematician and the Web 2.0: there is an established tradition in
mathematics of only presenting ideas that are mature, hiding the long phase
when alternative solutions are considered and discarded, thus hindering
collaboration outside very closed, small and established groups. Web 2.0
revolutions this scenario with immediate and significant benefits. At the same
time, it inherits from the standard practice the production of short reviews
of published material, also creating a soothing continuity with the usual modus
operandi. Also for the working scientist the Web 2.0 provides clear benefits:
it allows to collect and spread mathematical ideas that can be relevant to a
particular scientific topic and that are usually scattered around or not easily
accessible in the standard mathematical literature. Finally, Web 2.0 technologies
could help to gather together the critical mass required to do formal
mathematics in the large. The last promise, however, at the moment remains so.

Considering now Semantic Web technologies in place of (or combined with) Web 2.0
technologies, the actual benefits seem to be less revolutionary. Indeed, the
greatest benefits and promises of the Semantic Web come from the kind of
automated reasoning that can be performed on the data using advanced query
languages and inference engines. This is actually the case for most domains
that allow effective querying once equipped with an ontology. Querying
mathematical libraries is much more complex. Indeed, the kind of
reasoning required even for simple mathematical queries --- like generalisation
or instantiation of statements, or inhabitation of mathematical structures and
theories --- is already too hard to be coded as a simple query or to be
inferred automatically by an RDF reasoner. Unless the author is able to provide
evidence that this will not be the case any longer, I will remain convinced that
ad-hoc mathematical reasoners will be required. Thus the problem becomes how
to integrate this ad-hoc reasoners within the Web of (linked) Data, that remains
the best approach to represent all the extra-logical/mathematical structure (e.g. the rhetoric structure, the theory level functional structure, etc.). And, at
an higher level, it remains to be seen what kind of revolutionary service can be
provided. Here I am not considering simple queries --- even ad-hoc ones ---
as revolutionary, at least for the working mathematician. For the working
scientists I see more hope, since even simple querying can be valuable.

To summarise, I invite the author to provide a more critical analysis of the
past failure and future promises of integration of MKM with the Semantic Web.
The analysis needs to be at the level of actual, concrete services to be
provided, not assuming that describing everything in RDF will automatically
trigger new revolutionary services.

The final part of the paper is ``a road map for getting the mathematical Web
of Data started: what deserves to publish, how to interlink them, and how to
take advantage of these new connections''. Here again the road map is just
briefly sketched, but it describe a sound approach and it contains interesting
pointers to previous works (some by the author) that will be useful to the
reader.

To conclude, I think that the paper is valuable since it provides: 1) a
retrospective on part of the history of MKM; 2) valuable pointers to techniques
and tools to be known to merge MKM with the Semantic Web; 3) a not very deep
but wide categorisation of languages for the representation of mathematical
knowledge in a broad sense. Moreover, it serves as a position paper to renew
the efforts on the integration of MKM with the Semantic Web, with the hope that
this second try will not be another false start. Even if I am not convinced yet
that this will be the case, I also believe that the Semantic Web technology is
now mature enough to make it worth to finally settle the question about its
effective exploitability for mathematical knowledge. Hence I recommend the paper
for publication after a resubmission to integrate the discussion on the failures
and hopes that I have already suggested.

Other comments:
- abstract: "logical and functional structures - formula, statements and theories". Make it clear what is logical and what is functional for you or avoid the
distinction in the abstract. I think that the distinction is not clear even in
the rest of the paper.
- page 1: you say that Web 2.0 has addressed problem (i). Here it is difficult
for me to let you avoid the usual problem about accuracy of Web 2.0 data. Since
mathematics is a particular domain that favours accuracy above all, the issue
here is particularly critical. You should spend some words on it if you have an
opinion and/or look for references in the literature, if any, or point out
about the lack of references on the topic.
- page 3: "They do support internal..." Rephrase.
- all over: footnotes are very confusing. Some are given at the end of the
paper, some as standard footnotes, and there is no way to distinguish them.
- section 1.2.3: the early works described here pre-date the wave of libraries
and tools (e.g. AJAX, Flash) that nowadays allow the design of consistent Web
interfaces for editing documents. Also for this reason, the services provided
at the time provided little interaction with the user and almost no lightweight
editing. In my opinion, more interactivity is the key to provide innovative
services.
- section 1.3.2: about the critics to existing ontologies for mathematics, I
think that there is another point that deserves attention. Two important
transitions in mathematics are the ones from abstract to concrete (and back)
and the one from informal to formal (and back). All the hand-made ontologies I
am aware of (not the ones generated e.g. from formal theories) handle these
transitions very weakly. Nevertheless, the abstract to concrete transition is
probably the most important one to effectively categorise mathematical
knowledge according to the moder practice.
- section 1.3.2: reading this section the following critics come to my mind:
ad-hoc queries on mathematical data (logical/functional structures) --- even
simple queries --- are often implemented using non trivial data structures and
algorithms (e.g. context trees, higher order unification, querying up to
isomorphisms, etc.). It seems unlikely to me that the implementation of these
algorithms can be simplified by working on top of an RDF representation or by
reusing standard reasoners on RDF libraries. Do you have any kind of evidence
against my claim?

- page 7: "Mathematical formulae employ...". Here you are making a bit of
confusion between the presentation level --- which is essentially non
extendible --- and the content level, that can be extended.
- page 7: "a one-to-many mapping" I strongly disagree. The notation is
many-to-many. For instance, just consider typical ambiguity that is used in
a paper to omit information but only in contexts where it can be retrieved.
- beginning of page 8: I am confused here. You are considering "which example
for the same thing" (and other examples) as a purely notational choice.
I consider notation and presentation to be distinct. They are both context
sensitive and sensitive to the user model, but in different ways.
- page 9: "is a vector is defined"
- page 10: The description of requirement F (in particular requirement F.C)
seems to implicitly suggest a linear order on the "degrees of formality".
When foundational issues are taken in account, this is not at all the case.
More generally, it is not even clear to me if supporting multiple foundations
is meant to be part of requirement F (or F.C) or if it is an additional
requirement (or sub-requirement of F). Indeed, some languages (like OpenMath)
are equipped with a natural dichotomy formal/informal (commented), but they do
so without any reference to the foundation. Hence a foundation must be singled
out from the very beginning. Hence they respect requirement F.C, but not its
stronger version.
- page 10: requirement "L rightarrow" and "L leftarrow". From my experience,
the main problem of linking to external mathematical resources is a problem
of context (and notation). Indeed the external resource, if it is very fine
grained (e.g. a formula or a theorem) and if it is embedded in the current
document, it is moved from its context of definition to the one of the current
document. Hence it can make no sense to the reader. Moreover, even if left in
its original context, we can have notational problems. Should it be rendered
consistently with the current document or with the target one? Finally, the
issue becomes more critical when we link from a formal text to an informal
one. Indeed, in this case there is usually a transformation from the formal
or content representation to the presentation level, while the target is
already given at an higher level. In this case, getting a uniform and non
confusing notation is often impossible, as well as linking back the informal
level to the formal one when the latter has not generated the former.
Even in this case, it could be interesting to differentiate between weak and
strong requirements L, with the latter addressing these issues. I expect
that many languages would just provide support for the weak requirements.

- page 11, end of section 3.2.2. The dichotomy between CDs as a tool to give
a semantics and CDs as a specification for phrasebooks has never been
resolved. Or, better, only the latter interpretation was the original one,
and people have been stretching them more and more towards the former.
My question here is the following: when developing ontologies out of CDs or,
more generally, when developing ontologies in general (not, e.g. generating
them from OMDoc documents), how is the above dichotomy to be resolved?
- page 12: the example about the exponential function is actually interesting
since a definition@type=implicit makes sense only in certain foundations and
not in others. For instance, in constructive mathematics is defines a
predicate over a function, but not a function itself.
- page 13: in all the page you often talk about MathML without specifying if
only MathML Presentation is supported. Make it explicit.
- page 14, section 3.3: "Mizar, Isabelle, Coq ... but less so on the theory
level". This is debatable written the way it is written. For instance,
Isabelle Modules and Locales, Coq Modules/Functors and Coq Dependent Type
Classes, are very complex mechanisms at the theory level. It is largely
debatable if they get it right or not, but the same applies to the mechanisms
used in OMDoc, that do not support easily first class mathematical structures.
- page 14: about antiquotations. Antiquotations were a major problem in HELM
because of the lack of context. Should an anti-quotation be associated to a
given context and to which one?
- page 14: "These languages are usually committed to...". I agree, but this is
counter-intuitive to the casual reader. Formalised mathematical objects have
more semantics in them than non formalised objects. A priori, it can be
expected that this should make communication to other systems easier and not
more difficult. Indeed, in a sense, exportation of the information is easy,
but not importation.
- page 14: "an author about ten times as long" The actual ration is very
sensitive to the domain and the variance is very high. Relax the sentence
somehow.
- page 16: the critics to Marchiori's work are too concise, to the point that
they become totally non informative.

- page 17, but also in many other places: "ontologies do not sufficiently
abstract" "are flawed", etc. Are flawed or not sufficiently abstract
with respect to what? Is there any explicit connection to one or many
requirements? Or are you using here an implicit super-requirement that ALL
information must be made explicit in RDF? In this case: 1) what ALL means?
how to determine it? 2) why? why is this requirement actually useful from the
user point of view? Your previous requirements were somehow more user
oriented.
- page 20: "however, the latter do not use URIs". Here and in other places there
seems to me to be an implicit and very strong assumption that all languages
should be written (sintactically) in machine friendly way (e.g. like in XSLT)
compared to a user friendly way. This is debatable. For instance, in a Coq
script all identifiers can be assigned in a unique way a long identifier that
is in fact an URI (with a different syntax). This association is purely
syntactical: no type-checking, disambiguation or complex algorithms are
employed. Nevertheless, the algorithm needs to parse the included files (or
they compiled counterparts), which is hard when done by an external tool.
Mizar is even more explicit, requiring the user to import each
notation/identifier in the preamble, practically associating an URI to each
identifier. Hence the complain by the author seems to be that these languages
do not have a machine readable syntax that can be parsed without any
computation at all and without knowing the semantics at all. Is that
reasonable? Or would it be better to differentiate between the concrete
syntax and the lack of explicit information? How difficult must the
explicitation procedure be to consider the information implicit?
- page 21: I appreciate the equilibrium and honesty in the conclusions (3.5).
- page 21, end of first paragraph. "one would always have to consider...".
I do not get your point here. When part of the information is in XML (not in
RDF), to write a query one needs to know the XML Schema. Thus it is not a
choice made by guessing something and one does not need to consider two
possibilities: the Schema tells you where is the data. You can devise a Schema
that allows to put the same data in XML OR in RDF, but this is not worst than
having a Schema that allows you to write the same data in two different
positions.
- page 21: "Translating RDF to XML". You meant the other way around.
- page 21: "handle abstraction and links less well". This should be motivated.

- page 21: "it would require a considerable effort of declaring"... You cannot
avoid this work. You are just doing it just once during the translation to
RDF. Is the sentence worth rephrasing?
- page 22: legende for the tables are mandatory!
- page 26, final sentence: "this would ultimately enable powerful machine
support..." You state this matter of factly but, so far, I have never seen
any evidence and I really doubt it. The amount and depth of inference required
to understand and progress in the proof of the Kepler conjecture is so much
that the entailment relation provided by Semantic Web queries will probably
provide no practical help. The overall amount of effort required is such that
the one spent in finding related documents and theories or similar activities
will remain negligible. I understand that you would like to end the paper with
some advertisement, but this seems definitely too far taken.

Solicited review by Alexandre Passant:

This paper presents a literature review on languages and ontologies for representing mathematical knowledge on the Web. This is a very dense paper, but it is well structured and reads well. As a review paper, it is worth accepting (even if the scope is very narrow), but there are some sections that should be revisited IMO, as discussed in the detailed review. I also believe that a few sections should be dropped. Especially, I think that there should be a longer discussion on reasoning, and that the section on discourse representation is not needed.

Detailed review:

- The motivation for identifying similar formulas written differently (1.1.2) is very relevant and should probably be a stronger statement. Do you have an idea of how much this happens, for instance in one of the wiki that you are mentioning in the rest of the SOTA ?

- In 1.2. you mention the common functions of XML for Semantic Web technologies. Keep in mind that RDF is not an XML serialisation but an abstract model. This sentence is imo confusing and should be revised.

- In 1.3.2, vocabularies like SCOVO should be mentioned. I think it would also be relevant to relate to vocabularies like voiD for representation mathematical information (stats) about datasets, as well as the Service Description vocabulary in SPARQL 1.1

- I am not convinced by the need for representing infos about project organisation and management in the intro of section 2. Or more precisely, I am convinced but that is too generic to be discussed here. Same apply for discussion about biomed, semweb or any scientific (or non scientific) topic, and ontologies suited for this, are not from the mathematical domain (FOAF for instance). I think that the authors should less emphasis this. The same apply for meta-data discussed in 2.3 (DC is generic) and 2.4 (these models can be used in many contexts). This is probably a matter of rewriting for the first ones, but I am not sure that section 2.4 is relevant (see also comments about 3.4.3 and 3.4.4 later)

- It would be worth mentioning the uptake on the models presented in 3.4. For instance you mention several ontologies for OpenMath CD, but are they used or are they just academic exercises ? That would help to know why / if new models are needed, or if it is more a matter on using existing ones

- What about DBpedia ? Even though it does not represent formulas per se, you could still reuse structured information about a theorem for instance (e.g. date, related topics, industrial usage, etc.) that could be useful in this context. I think that should be discussed

- I am not convinced about the section 3.4.3 and 3.4.4. These topics have a much broader range and would need a full section / chapter (paper ?) to be completely covered (I saw there are related submissions on SWJ on these topics - disclosure : I'm involved in one). Hence, I am not sure they bring much to the topic. This refers to my previous comment regarding the "models for meta-data and discussion". Also, if you want to gid in this area, you should mention probably CITO, SchoolOnto, et

- I would like to see discussions on how these mathematical models (esp. domain specific ontologies, section 3.4.6) relate to upper-level ontologies such as DOLCE. For instance, the relation with DOLCE regions / qualities and mathematical properties.

- In table 1 and 2, the explanations on ++ versus + should be more clear. It is a non-ense to add RDF(a), you should just mention RDF (as I believe you refer to the abstract model, not to the serialisation). In table 2, based on my previous comments, I also think that SALT, SIOC Arg, DC, DILIGENT are not needed here

- I was expected more discussions on reasoning capabilities of mathematical knowledge once it is translated in RDF. This is IMO one of the most exciting thing that could be done, e.g. validation using formulas, automatic computation etc. N3 seems to provide a framework for this, and that could probably be the case with the mapping that you suggest. But so far, there is no discussion and this is imo something needed (at least as discussion) in the final paper.

Minor comments:

- Introduction "Web 1.0 websites" -> "traditional websites"

- Footnotes and endnotes both have the same numbering and this is confusing. Either transform endnotes as footnotes, or use letters / numbers to differentiate

- Page 23: '"If the XML language hosts RDFa" : not clear what you mean here ?

Solicited review by Aldo Gangemi:

This paper presents a broad overview of approaches to representing mathematical knowledge over the semantic web.
The author spans through metadata, annotations, discussion frameworks, ontologies, languages, inference engines, notational systems, etc. As such, the paper has an impressive coverage, which makes it entitled as an overview paper.
It also contributes a proposal for renovating the interest to bring math knowledge over the semantic web.
The main drawbacks of the paper include a partial lack of tight integration/comparison between the approaches analyzed, and the varied depth at which the analysis is carried out.

On the other hand, the paper needs some rewriting in order to be even more useful. Some explicit proposal for improvement include:

- the main problem is the paper narrative: at a first impressions, it seems well articulated, however, at a second glance, there is only partly a consequentiality and internal coherence between the sections. Each section has a broad coverage (although several overview entries are too quick), but the relation between them, with respect to a general assessment of the state of art, is not straightahead for the reader
- the author provides two tables to reinforce the relation between the requirements devised, and the languages/vocabularies etc. presented, but the tables are difficult to reconcile with the text (my impression is also that some approaches are not summarized in the tables). Additionally, the tables contain unexplained (though partly intuitive) symbols. It's be good to extend the tables, or add new ones.
- if someone would like to find a place to make the state of art advance, there are too many places in the paper to figure out. Conclusions help only a little bit.
- the relation between reasoning capabilities brought by semantic approaches to mathematics, and annotation or querying support over stated knowledge is not trivial. The abovementioned tables do not contain any hint, while the text does only sparsely
- I like the inclusion of discourse and rhetoric structure representation in the overview, since the dynamics of a scientific field should be as important as its established results for any knowledge engineering effort on that field. However, also here I would like to see specific requirements singled out and compared in some table
- as a final suggestion for improvement, the depth level when analysing the different approaches greatly varies, from a half-page to just a reference to bibliography. More justification should be given when providing finer or coarser detail than average, e.g. putting together affine approaches (and possibly spotting the main differences), or explaining why some approach is of minor interest.

Tags: