Linking discourse relations and induction of bilingual discourse connective lexicons

Tracking #: 2687-3901

Authors: 
Murathan Kurfali
Sibel Özer
Deniz Zeyrek
Amalia Mendes
Giedrė Valūnaitė Oleškevičienė

Responsible editor: 
Guest Editors Advancements in Linguistics Linked Data 2021

Submission type: 
Full Paper
Abstract: 
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse information in the target languages through error-prone automatic means. The current paper aims to provide more direct insight regarding the cross-lingual variations in discourse structures by offering an aligned version of a multilingual resource, namely TED-Multilingual Discourse Bank, which consists of independently annotated six Ted talks in seven different languages. It is shown that discourse relations in these languages can be automatically aligned with high accuracy, verified by the experiments on the manual alignments of three diverse languages. The resulting alignments have a great potential to reveal the divergences the target languages exhibit in local discourse relations, with respect to source text, as well as to lead to new resources, as exemplified by the induction of bilingual discourse connective lexicons.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/Jun/2021
Suggestion:
Accept
Review Comment:

(1) Originality:

The idea of parallel linking is known in many other fields, but not yet applied in this one.
From this perspective, the paper brings newness.

Insight regarding the cross-lingual variations in discourse structures by offering an
aligned version of a multilingual resource.

New elements/features in the theme, e.g. a new toplevel sense called Hypophora.

The generated lexicons.

(2) Significance of the results:

The evaluation, cf. tables 6, 7, 9 and 10 shows good results.
They are encouraged results.

(3) Quality of writing:

The paper is well written.

The data files (i.e. lexicons) are provided by the authors under “Long-term stable URL
for resources” at the http://metu-db.info/mdb/ted/resources.jsf.

(4) Tthe provided data artifacts:

Not complete.

The data file/s of the linked texts are not available.

Review #2
Anonymous submitted on 06/Jun/2021
Suggestion:
Minor Revision
Review Comment:

This paper proposes two methods, word alignment techniques and cross-lingual sentence embeddings, for aligning discourse relations in a multilingual parallel corpus, called the TED Multilingual Discourse Bank (TED-MDB), which comprises TED talks in English and their translations in six other languages. The corpus is annotated following the PDTB framework, which provides analyses of the discourse relations and discourse connectives at the local discourse level. The paper compares the two methods, and evaluates the second one to be more efficient, based on its better performance and its relatively simpler architecture. The main product induced from the experiments is the updated TED-MDB corpus in which the discourse relations in one language are aligned with those in the other languages. The discourse-aligned TED-MDB corpus is also used to create the bilingual lexicons of discourse connectives for the language pairs represented in the corpus.

This is an interesting paper which contributes to developing linguistic resources necessary for investigating discourse structure cross-linguistically. The methods used in this study are developed upon existing architectures, but extended and tailored for the purpose of cross-lingual alignment at the discourse level. The significance of the results seems satisfactory, as documented by the experiments. The paper is also written in clear language and is easy to follow. My main concern is that the corpus (and the lexicons) might not be sufficiently large, as to make the conclusive claims about the soundness of the methods/evaluations employed. Overall, I rate this paper as accept with minor revisions. I only have some minor comments, as given below:

1. I am curious if the annotation of the TED-MDB was supported by the discourse connective lexicons in the respective languages (e.g., English, German, Turkish). Or, were the lexicons compiled afterwards or unrelated to the development of the corpus?
2. Page 2, line 50: It might be a bit incorrect to say the PDTB employs five types of relations. Discourse relations proper hold for explicit/implicit connectives and AltLex expressions, but not for EntRel and NoRel.
3. Page 3, Line 17-18: Since the paper uses the PDTB 3.0 hierarchy, does it also examine AltLex-C cases?
4. Page 3, Line 35: How are the arguments chosen for NoRel?
5. The annotation of TED talks extends the use of the PDTB framework to the spoken text (although probably scripted beforehand). A short note on this would be useful to know if discourse annotations differ between written and spoken texts.
6. Page 4, Line 33-37: “… researchers need to decide on the information that characterizes discourse connectives in different languages”; Why would this be a problem? Aren’t discourse connectives supposed to have a common definition across languages?
7. I think it would be useful to know from the beginning what the authors mean by bilingual lexicons for discourse connectives, that is, how entries are organized cross-linguistically (rather than just including English translations for non-English entries).
8. Page 4, Example 7: The example is not clear to me. When there is already an explicit connective (‘also’), what is the motivation to interpret another implicit relation (by adding ‘because’)?
9. Page 7, Line 44 & 37: ‘gibi’ or ‘kadar’, what exactly is the connective?
10. Page 7, Line 49: ‘in all four criteria’; what are the criteria?
11. Page 12, Table 8: The changes in discourse relation types from implicitation and explicitation seem important and needs further exploration. Some notes on this (why there are specific changes in relation types) would be insightful.
12. Page 12: In addition to implicitation/explicitation and relation change, there is another issue which, I believe, requires attention in the investigation of cross-linguistic discourse structure. This relates to the arguments of the relations. Do/how much arguments change while translation, in size or in syntactic structure (e.g., from main clause to subordinated clause), based on the linguistic properties of the respective languages.
13. Page 15, Table 9: All the bilingual lexicons are small in inventory size. For example, the English-German lexicon includes 44/49 unique connectives, while the Eng-DiMLex includes about 150 English connectives and DiMLex 2.0 includes over 250 German connectives. How to deal with those missing connectives in the bilingual lexicons?
14. Page 15: Table 9: The Min/Max part seems to bit unclear to me. It would be useful the explain the numbers, probably with the help of specific examples.
15. Page 15: Are the connectives in one language more polysemous than those in other languages? Does this indicate anything about explicitness/implicitness or the signalling of relations in those languages in general?

Review #3
Anonymous submitted on 14/Jun/2021
Suggestion:
Reject
Review Comment:

Note on the suggested decision: I think that a more suitable category for this paper is "revise and resubmit" but, at the same time, I find "reject" too strict.

The paper presents an interesting work on the use of a newly annotated resource such as TED Multilingual Discourse Bank to induce bilingual lexicons of discourse connectives. However I think this paper as is has a series of flaws/shortcomings that must be addressed:

1) the writing and the narrative must be more focused. I think that the authors fail to present all the work that is behind this paper. In summary, the authors are introducing a new language resource, presenting two methods to automatically align discourse relations between pairs of languages, use the automatically aligned data to induce bilingual lexicon of discourse connectives. This is a lot of material and I think that the presentation and the rationale of the paper is not clear enough.

2) The TED Multilingual Discourse Bank TED-MDB) : this is a new language resource, unique in its annotation layer. I think that more information should be given on properly present the resource: who has annotated it? how many annotators? is there any agreement study? how is the hierarchy of relations in PDT structured? how many classes? what is their meaning? why is the TED-MDB data format difficult? was any language specific adaptation of the guidelines needed? what happens if a discourse relations hold between two non-adjacent arguments: is it annotated or not?
The first 6 examples should help the readers following the different types of relations. I think that it would be more useful to present the different classes and immediately after provide the examples. All examples are in English except for example 2, why? I find this quite disturbing here.
You also fail to properly explain/introduce all relevant annotation layers. It is stated only at lines 40--42, pg 8, that there are gold annotations of aligned discourse relations. This is confusing for the reader. It also open to some questions on why no supervised methods have been attempted (not even a justification for not doing it is given).

3) Dictionary of discourse connective: this seems the core of the contribution, but the motivations for creating such resources are lacking, if not left implicit in the paper. Why do we need such resources? Why are they useful?
The authors claim that there has been an "upsurge" of discourse connective lexical, and then they only present 4. I think you can expand this with more pointers and presenting in more details such lexical (how have they been created?)

4) Alignment methods: two methods for aligning the annotations in multiple languages are presented. First, the presentations of the two methods needs to be rewritten. I think that the authors can shorten section 3 and be more precise and concise on what they did. It would also be interesting to know why these two methods have been selected rather than doing a manual alignment for all languages.
Method I: the main idea is to apply word alignment to identify the same portion of texts and then project the discourse relations from EN to any other aligned language. Why using NLTK sentence tokeniser? The module works for English, but for other languages either you use a different sentence tokeniser or it makes no sense. Unless you have used a specific module - in this case you have to make it clear. The presentation of the scoring method needs to be restructured: first states that you look for all alignments based on a connectives, then present the "overlap" approach.
Method II: first question I have is: why a 0.6 cosine similarity threshold? on the basis of what has the threshold been established? The manually aligned data? If so, this opens up quite serious doubts on the validity of your evaluation: it is like you have "trained" and tested on the same data. No surprise the results are quite good. It is also not easy to follow how the scoring is computed: you speak of similarity of discourse relations, but shouldn't it be the similarity of the arguments of a discourse relations?
When presenting the score between discourse relations matches (lines 23-33 pg 7): why do you first say a 1 or 0 score and then present the other scores according to the different layers of the relation taxonomy structure?
How is the semantic similarity between the connectives obtained? You apply LASER to tokens?

5) Evaluation: I would add some reference in support of the evaluation method you have adopted, this will better support your claims about the fact that in literature linking quality is evaluated using Precision, Recall and F1-score (these are very common methods). The section can be shortened and be more precise. Table 6 and 7 should be merged. You are comparing the results, so you must allow the readers to follow your comparison. The use of bold should be per pair of language per method - bold: methods that gives the best result. After having presented the results, I would open reflections on the errors and problems per method.

5) I find it difficult to follow the rationale behind the "retranslated" examples. They are not very useful. You can use Latex packages that align translations example (word by word) - e.g. gb4e - to make it easier to follow the potential gaps in the translations.

6) Section 4: The rationale of the section is a bit lacking in my opinion. For some languages you could have used the Gold data to conduct the analysis and reflect on the differences of realisations of connectives and types of discourse relations. I have some problems in following all the pairs in Figures 1 and 2. For some language pairs you could report the gold data, for others they are automatically aligned and with no evaluation (no gold available). It seems to me that it would make more sense to split this level of analysis: what happens at gold level first, and then what happens on the other languages for which no gold alignment is available. Are the tendencies the same or do they differ? This may provides additional insights in the evaluation of your approach.

7) The lexicon: this is one of the major product of this work but it is presented in a rushed way. I would loved to see an example of an entry of the lexicon so as to better understand its structure - especially the connection to the connective-lex.info web page.

I was not able to find any resource file accompanying the paper, nor a link to publicly available repository.

Other comments:
- lines 43-44, pg 7: please make the translation of the Turkish connective using a different font from the original. Otherwise it seems that in Turkish you have 6 connectives.
- lines 26 -36, pg 6: the paragraph contains 2 sentences. Swap their order, it makes the flow of information more coherent
- line 49 pg 6: what is a bi-text? define it!
- line 39 pg 4: RST --> resolve acronym first! Rhetorical Structure Theory (RST) + add reference
- line 4 pg 7: the score is scored --> the score is calculated/obtained