Exploring Rank Aggregation for Cross-Lingual Ontology Alignments

Tracking #: 2240-3453

Authors: 
Juliana Medeiros Destro
Javier Alvaro Vargas Muñoz
Julio Cesar dos Reis
Ricardo da Silva Torres

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
Abstract: 
Cross-language ontology alignments are of paramount importance in several applications. A common approach to define proper alignments relies on identifying the relationships among concepts from different ontologies by performing multiple entity-based searches. In this strategy, the most suitable matching is defined by the top-ranked concept found. Often, multiple similarity rankers, defined in terms of different similarity criteria, are considered to define candidate entities. In this case, their complementary view could be exploited in the definition of the best possible matching. In this paper, we explore the use of rank aggregation functions, under both unsupervised and supervised settings, in the task of defining suitable matches among entities belonging to ontologies encoded in distinct languages. We conducted a comprehensive set of experiments with standard datasets from the OAEI competition, using ontologies in the Conference domain, and mappings among 36 language pairs. Experimental results show that the use of rank aggregation approaches leads to better f-measure results when compared with state-of-the-art techniques in cross-language ontology matching.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ondřej Zamazal submitted on 14/Aug/2019
Suggestion:
Major Revision
Review Comment:

The article deals with different rank aggregation techniques applied on cross-lingual ontology matching. The main goal of the article is to explore performance of several rank aggregation techniques. The authors claim that the results show a superiority of rank aggregation techniques over state-of-the-art techniques in cross-lingual ontology matching with regard to F-measure. In my opinion, this conclusion is not supported by provided experiments because of the following reasons
a) For experiments only test of type ii has been used from the OAEI 2018, i.e. cross-lingual ontology matching of the same ontology translated into different languages. However, there is also test of type i in the OAEI 2018, i.e. cross-lingual ontology matching of different ontologies in different languages. This is more interesting from the perspective of cross-lingual ontology matching.
b) The article uses different part of dataset for measuring performance of proposed rank aggregation based cross-lingual ontology matching approach than it has been used within the OAEI 2018 MultiFarm track. Therefore, straighforward comparison of the performance measures cannot be done. On the one side it is true that "the official results reported on the competition refer to a blind dataset" as the authors wrote, but on the other side the full MultiFarm dataset [2], including reference alignments, is available online from the web page [1]. Moreover, the state-of-the-art tools from OAEI 2018 (the links should be available in corresponding OAEI 2018 papers of matchers) are available for running using SEALS client [3, 4]. Therefore, it is possible to run comparison of the state-of-the-art.
c) Experiments lack details about how well each ranker performs separately. In my opinion, the experiments should be extended with regard to an analysis of performance per each involved ranker (similarity measure). Using the full MultiFarm dataset enables better comparison not only with regard to the state-of-the-art matchers but also with regard to separate rankers (similarity measures).

Ideally ontology matching approaches from Related work should be compared to the proposed approach. This has not be done. It is probably because those approaches do not provide their ready-to-use implementation, e.g. Spohr et al. 2011. The proposed approach is thus compared to the matchers participated within OAEI 2018. These matchers should also be described within the article. Do they have some specific regard to cross-linguality?

The abovementioned new approach to experiments will produce much more detail results which need not be all in the article however they should be available online and the article should merely show the summary of important results. Based on the new extended experiments it is also necessary to provide a new discussion and conclusions stemming from new findings.

The related work do not provide details about ontology matching approaches, e.g. Fu et al. 2009, Spohr et al. 2011, and about supervised rank aggregations, Pujari and Kanawati 2012, Subbian and Melville 2011 and Wu 2013, in Related work section.

The rank aggregation approach is applied on cross-lingual ontology matching. I miss some discussion in the article why the authors think that this approach should work well within this type of tasks: Is this rank aggregation approach more suitable for cross-lingual ontology matching than for monolingual ontology matching? The expectations should also be supported by experiments; in this case it could be based on using the original conference track on which the MultiFarm is based.

While the involved rank aggregation techniques are very important for the article, they are just listed (with references) in Table 1. They should be briefly described in the text or in the table.

For experiments the authors selected two syntactic similarity measures based on Christen study dealing with personal name matching from 2006. While this is certainly an important paper, it still deals with personal names. I would expect to base this selection on some ontology matching oriented paper, e.g. [5]. Moreover, in [5] there are also some baselines of rankers (string based measures) for MultiFarm. There were further two semantic similarity measures selected but there is not any explanation why those two measures were chosen.

In my opinion, along the new online supplementary detail results of extended experiments as suggested above there should also be available source code of the approach and if possible online tool. This would increase its practical contribution to the community.

The approach is based on comparison of each entity to all entities of the same type found in the target ontology. Based on this it seems that the approach is not scalable. What are the expected limitations of the approach? The expected limitations of the approach should also be detailed in the article.

Did authors try different language translators? Since three out of four similarity measures depend on natural language translation performance it is worth trying different natural language translators. The paper [6] deals with this topic.

Further remarks:
* p. 1, "Ontology concepts explore strings written in natural language to denote labels [2]." - this sentence is not clear to me. Mainly because of using "explore" within this context.
* p. 2, The abbreviations such as OAEI should be explained.
* p. 4, the article should explicitly distinguish multilingual and cross-lingual ontology matching.
* p. 4, "The authors demonstrate that CRF has superior effectiveness when compared to some supervised alternatives." - Please be more specific about "some supervised alternatives".
* p. 4, "This work investigated four rank aggregation algorithms." - It would be interesting to be more specific which rank aggregation algorithms where considered.
* p. 4, "...[21]...Then it enhances the evaluation with other matching strategies." - Please be more specific about other matching strategies.
* p. 5, there is an example using two ontologies showed in Figure 8. It seems that there is relation of subsumption between linked concepts but there are also linked concepts which do not have subsumption relations between them, e.g. "paper" and "author of contribution". Please clarify this in the article.
* p. 5, "Each one of the four generated rankings correspond..." - It was not mentioned before that there are four rankers.
* p. 5, There is no any reference to WordNet.
* p. 8, Although it is written that there are 10 languages in the MultiFarm dataset in Section 4.1, there are nine languages in Table 3. Moreover, it is not very clear what is the purpose of Table 3 since there are nine almost the same rows except the language.
* p. 10, the GP-FFP1-p500g50B4 method from Table 4 is not in Table 1.
* p. 10, the MART method from Table 4 is not in Table 1.
* p. 11, "The rank aggregation CombANZ method was able to find the correct mapping candidate." - This needs some explanation.
* p. 11, "Although reporting a lower f-measure than the best tool in OAEI 2018 competition, the rank aggregation technique was able to improve the results of conference-conference-pt-ru (cf. Figure 12) alignment. Table 6 describes this example." - This note is not clear to me. I found argument showed in Table 6 as rather weak since it shows that it could help in some case but in all other approaches were better.
* p. 12, "The multilingual semantic networks enables..." - enable
* p. 12, "UMLS" - The abbreviation should be explained.

[1] https://www.irit.fr/recherches/MELODI/multifarm/
[2] https://www.irit.fr/recherches/MELODI/multifarm/dataset-2015-open.zip
[3] http://oaei.ontologymatching.org/2015/seals-eval.html
[4] http://oaei.ontologymatching.org/2018/seals.html
[5] M. Cheatham, P. Hitzler: String Similarity Metrics for Ontology Alignment In Proceedings of ISWC, 2013.
[6] M. Abu Helou, M. Palmonari, M. Jarrar: Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping Journal of Artificial Intelligence Research, 2016.

Review #2
Anonymous submitted on 13/Sep/2019
Suggestion:
Reject
Review Comment:

The paper tests different supervised and unsupervised rank aggregation methods on a cross-lingual ontology matching (OM) task. Results show that ranking aggregation method are promising in improving the quality of the results returned by the system.

The paper provides some contributions that are potentially interesting for supporting OM, and, maybe, cross-lingual OM in particular. I have appreciated in particular the following things:

• A systematic evaluation of several stat-of-the-art supervised learning to rank methods to support OM; many of these approaches have not been tested enough in the field of ontology matching and it is good to bring in some techniques that may have been overlooked.
• The idea of using learning to rank is interesting because it may be used also with rankings provided by similarity measures that do not fall in a common [0,1] interval (e.g., Lucene Conceptual Scoring, which is very handy in several practical problems).
• In the experiments, only a small part of the alignment has been used for training the supervised approaches (differently than what I have seen in some other approaches that have tested machine learning methods); this makes learning to rank – applied to rank aggregation – a good candidate to support interactive matching, which is a very important task and much under the attention of the community today []
• Experimental results provide hints that the proposed techniques may bring benefit in terms of performance. In particular, it should be noticed that performance discussed in the paper are compared to much more complex OM systems.

All the above observations may me think that there are some nice ideas in this paper. However, I think that these ideas must be much more developed. In the paper in its current status, there are too many weaknesses, which make the contributions of the paper quite far from the kind of clear-cut and robust contributions that are expected in a Semantic Web Journal paper. The paper as such fails to convince that its contributions deliver significant results. Also the presentation should be much improved. Finally, lack of comparison with related work makes it difficult to assess its actual novelty.

For these reasons, I have to suggest that the submission is rejected; at the same time I encourage the authors to further develop their ideas and re-submit the paper in the future.

1) Claims of the paper, overall approach, and significance of results

The paper discusses a problem framed as “ranking aggregation” and evaluates it on cross-lingual ontology matching. However, this ranking aggregation problem (see more comments on the terminology here below) is general enough to be perfectly applied to any OM task, not only to cross-lingual OM tasks. So it is not clear at all why the authors have focused on cross-lingual OM. What is so peculiar in cross-lingual OM that requires this special attention to rank aggregation? The question is relevant because:
• the same problem has been addressed in OM as the problem of combining different matchers, which opens up the problem of a stronger comparison with related work (see Related Work section);
• a much smaller number of resources are available to evaluate cross-lingual OM: one track of the OAEI with few ontologies in a specific domain; blind evaluation; few systems that have participated to the track; other OM tasks have been used to evaluate cross-lingual OM approaches but in quite different settings (e.g., linguistic ontologies).

The authors should take one of these options: further motivate and substantiate the peculiarity of cross-lingual OM that makes it the best application field for the proposed ranking aggregation techniques; or evaluate their work in a broader experimental settings, which also consider mono-lingual OM tasks and compare their work with approaches proposed to similar (but I would rather say “equivalent”) problems.

Here I would like to proactively share some insights with the authors, hoping they could be useful for a future submission.

• There are some peculiar issues emerging in the cross-lingual domain, which IMHO are relevant for your work and adequately considered:

--- machine translations introduce a branching factor when more translations are available for one word; more translations are particular useful to handle polysemous words’; ambiguous words are translated differently depending on the context in which they appear but for concept labels you frequently do not have the context; it is not clear how this issue is considered in the proposed approach. If you query a machine translation service, you can trust the top result (but this is very naïf) or get more possible translations; when concepts from distinct ontologies are translated, different translations can be returned also for similar words; so usually it is required to deal with multiple translations. This problem may not emerge with NASARI-based similarity, but it definitely emerges with the syntactic and lexical similarity methods that are run after the translations are collected. In the paper it is not explained at all how you use the results of machine translation but it does not seem you collect and use multiple translations (and, if you do, you do not explain how).

--- If on top of that you use a reference lexical knowledge base like WordNet, a second branching factor may occur (each translation hitting more than one concept). The paper does not explain how WordNet is used to compute the similarity.

• There are some known aspects of OM that are relevant for your work and I think have not been adequately considered.

--- Some OM systems, for a similarity function, compute the similarity between any pair of elements; this is equivalent to store a matrix for each similarity function; some other systems, e.g., AML, and, I think, LogMap, do not do it to scale with very large ontologies. If the system uses similarity matrices, rank aggregation is IMHO equivalent to the combination of these matrices, that is, combination of different similarity scores (supervised and unsupervised approaches exist).

--- The methods proposed so far combining similarity scores may have problems when full similarity matrices are not stored (because of data sparsity); in this case, using an approach specifically addressed to merge rankings instead of similarity scores may be useful, e.g., to be applied to very large ontologies.

--- The latter may be a nice motivation scenario for your method.

--- Approaches that combine similarity scores, by known functions (avg, max, min, etc.) or more sophisticated methods (e.g., weighted linear combinations) have troubles when scores different from similarity measures are used (e.g., Lucene Conceptual Scoring, page rank, etc.); your method could fill this gap, but in the definition of the problem you use the assumption that all the scores are normalized in a [0,1] range. I suggest considering other measure and discuss this as an advantage of ranking-based aggregation vs. score-based aggregation.

--- Matching each source concept with its most similar concept is a very naïf mapping selection strategy. Most of systems use at least a threshold, some others may learn when to match using machine learning or optimization methods (also applied to cross-lingual matching). Otherwise I would always match each source concept, which is unrealistic because some concepts of the source ontology may be not covered by the target ontology. This problem is not addressed in your method because in this specific task the same ontology has been translated in different languages; but OM systems are not defined to work with benchmarks but with real problems. Observe that this adds a bias to the significance of your results, because the hidden assumption is that a “best match always exist among the candidate concepts”, which is only true in the particular evaluation settings.

As a summary, I think that there is the space for finding a scenario, a problem definition and an evaluation setting that better supports the valuable idea of your paper (using ranking-based aggregation instead of score-based aggregation). Suggestions are: combining matchers that use scores different from [0,1]-constrained similarity; dealing with large ontologies where partial rankings are generated for each concept (e.g., top-15 most similar concepts) without computing entire similarity matrices; consider the scenario where user inputs are used to customize the ranking aggregation function.

2) State-of-the-art

The authors cover the state-of-the-art in Information Retrieval-based methods. However, there are important papers in two subfields relevant for your work that are missing:

A - Cross-lingual ontology matching

Chen, J., Xue, X., Huang, Y., & Zhang, X. (2019). Interactive Cross-Lingual Ontology Matching. IEEE Access.

Bella, G., Giunchiglia, F., & McNeill, F. (2017). Language and domain aware lightweight ontology matching. Journal of Web Semantics, 43, 1-17.

Helou, M. A., Palmonari, M., & Jarrar, M. (2016). Effectiveness of automatic translations for cross-lingual ontology mapping. Journal of Artificial Intelligence Research, 55, 165-208.

Helou, M. A., & Palmonari, M. (2015, September). Cross-lingual lexical matching with word translation and local similarity optimization. In Proceedings of the 11th International Conference on Semantic Systems (pp. 97-104). ACM.

(I suggest checking also publications on problems related to cross-lingual OM, like cross-lingual ontology enrichment:)

Ercan, G., & Haziyev, F. (2019). Synset expansion on translation graph for automatic wordnet construction. Information Processing & Management, 56(1), 130-150.

Ali, M., Fathalla, S., Ibrahim, S., Kholief, M., & Hassan, Y. (2018). Cross-Lingual Ontology Enrichment Based on Multi-Agent Architecture. Procedia Computer Science, 137, 127-138.

(Matchers’ combination)

Isabel F. Cruz, Flavio Palandri Antonelli, Cosmin Stroe: Efficient Selection of Mappings and Automatic Quality-driven Combination of Matching Methods. OM 2009

Eckert, K., Meilicke, C., & Stuckenschmidt, H. (2009, May). Improving ontology matching using meta-level learning. In European Semantic Web Conference (pp. 158-172). Springer, Berlin, Heidelberg.

Xue, X., Wang, Y., & Hao, W. (2015). Optimizing Ontology Alignments by using NSGA-II. International Arab Journal of Information Technology (IAJIT), 12(2).

(Related to combination and worth a look:)

Duan, S., Fokoue, A., & Srinivas, K. (2010, November). One size does not fit all: Customizing ontology alignment using user feedback. In International Semantic Web Conference (pp. 177-192). Springer, Berlin, Heidelberg.

3) Experimental evaluation

See point 1) for the main arguments against the experimental evaluation. I am aware of the few numbers of resources for cross-lingual ontology matching. However, the proposed evaluation is insufficient because 1) it does not consider relevant related work and 2) it exploits assumptions true in the evaluation dataset that are unlikely to occur in real-world scenario (for all the source concepts there exist exactly one correct match).

In addition, the proposed evaluation comparing with average results by other system is not very convincing, in particular, if we see the effect of the language pair on the performance. I suggest downloading the tools (I think that the set of three selected OM systems is ok) and conduct experiments with these systems. It is also possible to get in touch with the developers to make sure to use the correct settings. If this process is not successful, the authors may report about this attempt and go for a second-better option.

See detailed comments for more remarks.

4) Presentation

Overall the presentation needs to be improved significantly. The paper is written in a good English but has a list of problems:

• Some intuitive and well-understood concepts are stressed too much (with some repetitions) while not enough details are given about the most important sections; see detailed comments for examples; here I mention the lack of details about how translations are managed and about cryptic similarity measures (e.g., the one based on WordNet); also, as a major remark, CombANZ is not defined; I think that it is important to define at least the best combination methods (much more relevant than defining Levenstein or Jaro-Winkler, which are well known and not object of the present work).

• The formalization of the problem is not helpful; there is no need to define the problem in the most generic way and then use a very restricted version of the problem (only equivalence mappings are considered, a naïf mapping selection method is applied, mentioning thesauri as possible ontologies to map, while no comparison with approaches to match lexical ontologies is given). It is ok a general introduction, but, in the problem formalization, I suggest focusing on the specific setting that is supported by your work. See detailed comments.

5) Detailed comments

Page 1.

“We use ontology in a general sense, including taxonomies and thesauri [1]. “ → Then I expect you compare with work in this field, including cross-lingual ontology enrichment, which requires matching against a thesaurus / semantic net.

Page 2

“While single ranking techniques are used in ontology matching [7], rank aggregation is yet under explored. “ → This is really not true; see suggested related work.

“with a set of attributes “ → I do not understand what an attribute is; metadata? Synonyms? Again, I think there is a mismatch between the definitions here and the particular case considered in the evaluation.
“Each relation r(c_1,c_2)\in R […]” → I find this definition extremely confusing. First r(c_1,c_2) is defined as member of the set R, then it is defined as a function. Wouldn’t be simpler to just say that there is a set of relations r_1,..,r_n that represents relations and then a mapping is a triple with r \in R?

Page 3

The definition becomes overcomplicated from “Ontology Matching” on. I suggest using a definition from previous work and / or modify it only to the extent that the change is significant for your work (for example, you consider only one kind of relation, equivalence; different relations often requires very different methods.

When considering only equivalence relations, do you also assume that the cardinality of the mapping is 1:1? Or do you consider the case where you return an alignment of cardinality m:n? I think that by using a greedy approach to mapping selection you can possibly output an alignment with cardinality different from 1:1. This may somehow be ok, but it is a relevant aspect of the problem formalization to clarify.

Page 4

I was quite surprised that RankSVM was not considered among the possible approaches. It seems quite natural to use it over vectors that represent the different similarity scores. There are also quite efficient implementation of the algorithm, in particular for short vectors.

Page 5

“For similarity […]” → Do you only consider the top result returned by the service? What about ambiguous words? What if the service return different words in English for two equivalent concepts labeled in different languages?

“First, the similarity values […]” → I was really confused to find pessoa here among the concepts. Figure 8 is confusing as Author and Author of contributions seems to be properties. So what is this graph representing?

Considering the figures referred to here: I suggest using real examples from the ontologies in Figure 4, otherwise it is not much informative (the idea described in this figure is very trivial for those familiar with OM); Figures 5, 6 and 7 are not very useful as they are now. I suggest making one example with data from the ontologies used in the experiments and show the whole data processing flow in one figure.

“Levenshtein and Jaro, chosen by their performance reported by Christen study [39] “ → This study was about instance matching. Jaro-winkler for example should be ok for concepts too, but is particular good for names. For concepts a measure that better handle labels with multi-tokens could be useful (e.g., Jaccard, n-grams, etc.). I do not object much to the choice of similarity measures because your goal is to improve the results by ranking aggregation rather than finding the best measures to combine, but be careful with the reference.

Page 9

Table 3 is not informative, I suggest deleting it
“An evaluation protocol […]” → I appreciate the small set used for training, but more details are needed. It is not clear if you build a different model for each language pair (selecting 15% from each language pair), or you build one model for all.

“Although the unsupervised method “ → It is ok to report on results with this setting, but you can also add a new table for unsupervised methods, which reports on the whole dataset.

Table 4. Explain CombANZ in the paper. Also, I think that the evaluation should make a more detailed analysis of best performing learning methods and compare their performance on a different task. The main question is: what is conclusion a user should draw from these results? Which method should be used? Remember that in a real-world setting the gold standard is not available.

Review #3
Anonymous submitted on 12/Oct/2019
Suggestion:
Major Revision
Review Comment:

The paper presents an approach relies on rank aggregation for improving the effectiveness of cross-lingual ontology alignment systems.
The topic discussed in the paper is relevant for the Semantic Web Journal and it is also relevant for the research community.
Indeed, more and more artifacts, especially in the Linked Open Data realm, become available in a multilingual fashion, hence novel matching algorithms are required.
Overall the paper is well written and the presentation is satisfactory.
The research work performed by the authors is fair, but there are some issues, described below, that should be addressed by the authors for improving the quality of their work.

First of all, when we work with ontologies and multilinguality, it is important to distinguish two different scenarios:
- multilingual matching: ontologies to map are based on a variety of languages and labels that have been translated by domain experts. Here, mappings may be computed by using, exclusively, the information contained in the ontologies without the support of external services;
- cross-lingual matching: in this case, the two ontologies that have to be mapped are monolingual, but based on labels in two identified different languages. In this case, matching systems have to be supported by machine translation services in order to align their labels before to define mappings.
My guess is that this paper is focused on cross-lingual matching since ontologies are aligned per language-pair.
However, this fact is not completely clear in the paper and the authors should specify it in the introduction and then to recall it in the evaluation section.

Second, the first part of Section 2.1 can be expanded a bit.
The authors should assume that not all readers would be completely familiar with the ontology matching topic and that the paper should be self-contained.
Hence, a more detailed description of the ontology components should be provided.

Third, the evaluation part seems incomplete.
A Table showing a statistics about how many times a method outperformed all the others should be reported.
Then, a significant test between the use of the aggregation algorithm and the state-of-the-art system should be reported.
This way, it would be possible to verify the actual effectiveness of using aggregations.
Concerning the dataset use, the conference dataset of the OAEI campaign is quite small.
I suggest the authors to test your strategy by using bigger datasets related to both the medicine and agricultural domains.
Examples are:
- Medicine: MDR, SNOMED, MESH
- Agricultural: Gemet, Eurovoc, Agrovoc
This way, it would be possible to have an idea about how general is the aggregation strategy.

Finally, some minor changes.
At the end of the first paragraph of Page 2, Column 1, the authors should mention that one of the main goals of working with multilingual ontologies is also to break language barriers in accessing content.
Indeed, one of the motivations for which cross-lingual ontology matching strategies are needed is because, sometimes, ontologies are created only in local languages by requirement and they cannot be provided in different languages than the native one.
Then, always in Page 2, Column 1, when reference [6] is cited, the authors should mention that aggregations of information can also be performed by using multirelevance strategies as described in:
- Célia da Costa Pereira, Mauro Dragoni, Gabriella Pasi: Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. Inf. Process. Manage. 48(2): 340-357 (2012)
where each rank is seen as the result of one possible dimension for evaluating the relevance of an entity.
Hence, these ranks can be aggregated by adopting also a priority-based strategy.

Minor fixes:
- page 1, column 2, row 4: "We use ontology" -> "Here, we refer to 'ontology'"
- page 5, column 1, row 33: "We present an use case" -> "We present a use case"

Review #4
Anonymous submitted on 04/Nov/2019
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

This paper proposes a cross-lingual matching approach based on translation, lexical similarities and use of background knowledge (WordNet and BabelNet). These different strategies are ranked using rank aggregation which relies on supervised and unsupervised settings. Experiments have been carried out using a subset of the OAEI MultiFarm dataset.

While the idea of using multilingual resources as BabelNet is interesting in the field, the paper has many weak points, in particular:

- using queries, in the sense of information retrieval, in the task of ontology matching is addressed superficially and it seems to be reduced to transforming each ontology concept into a "query". This aspect is really unclear in the paper. However, this is the basis for motivating the use of 'ranks'.

- apart from using the weighted overlap strategy that is based on BabelNet, the other matching strategies are quite classical in the field. As most of the cross-lingual matching approaches in the literature, the proposed approach also relies on a translation step, before the matching itself, using English as pivot. Furthermore, ranking different similarities is not at all new and the approach does not bring any novelty to the cross-lingual matching problem.

- another big problem is evaluation. First, only a very small subset of ontologies from the MultiFarm dataset has been used (conference ontology and its versions in different languages). This choice however was not justified in the paper. Second, the comparison made to other matchers does not make sense as the blind results of MultiFarm are compared to the results obtained in the open subset of the dataset.
This open also the question of why the system has not been participated to the evaluation.

- related work is very superficial (there is no a comprehensive comparison of the proposed approach and the cross-lingual ones in the literature). Missing also works on matching optimisation.

- finally, many passages of the paper should be revised, as detailed above.

Major comments:

- "A common approach to define proper alignments relies on identifying the relationships among concepts from different ontologies by performing multiple entity-based searches" => only concepts ? entities and concepts ?

- "Ontology concepts explore strings written in natural language to denote labels [2]". => not only concepts. This should be rephrased.

- "Generation of correspondences between concepts from two different ontologies [3] is known as matching, whereas the result of this process is known as a mapping set or alignment". Again, not only between concepts. It is preferred to refer to the result of a matching process as an alignment.

- The examples in the paper are of the Biomedical domain but the evaluation run on a conference organisation domain. It could be nice to have homogeneous examples.

- "Ontologies are usually created for diversified purposes, even in the same domain, thus making it difficult to find correspondences between ontology elements". => This sentence should be revised. The difficulty is in the different levels/kinds of heterogeneity, not only in the fact that ontologies are created by diversified purposes.

- "The expected benefit of leveraging rank aggregation in cross-language ontology matching is the combination of different similarity measures, each one offering a particular and potentially complementary view of the similarity between elements". => This is a very classical discourse in the ontology matching field and it is expected do not be different in a cross-lingual scenario. What is the novelty here ?

- Could be better to split Background and Related Work sections.

- "This technique ensures that all entities of the source ontology are matched to a corresponding entity of the target ontology" => using only equivalences it would not be possible.

- "The subsets are 10% of queries for training set" ? Not clear what is the training set here. How examples are constructed.

- Three out of 5 best performing mappings includes the English language => using English as pivot it is some how expected.

- "Although reporting a lower f-measure than the best tool in OAEI 2018 competition, the rank aggregation technique was able to improve the results of conference-conference-pt-ru" => why the approach performs better to this pair ? what about the behaviours for the other pairs ? A deeper discussion is missing here.

Minor comments

- languages => natural languages
- multilingual ontology alignments ? not introduced before (cross-lingual and multilingual ontology matching are distinct tasks)
- ontology mapping => ontology matching
- The figures in page 7 should appear near to their reference in the text
- Table 3 does not make sense as all lines refer to the same ontology.