Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations

Tracking #: 2763-3977

Shmuel Liebeskind
Chaya Liebeskind
Dan Bouhnik

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
In this article, we investigate Machine Translation (MT) in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language) by the ability to translate it to another spoken language. First, we detail the construction of publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2nd- late 4th century) Hebrew–Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the Statistical Machine Translation (SMT) approach, which significantly outperforms the Neural Machine Translation (NMT) in our use case, validate the excepted high quality of the translations. The trained model fails to translate Aramaic texts of other dialects. However, when we train the same SMT model on another Aramaic-Hebrew corpus of a different dialect (Zohar - 13th century) a very high translation score is achieved. We examine an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3rd- late 5th century). Since we do not have parallel corpus of the Talmud, we use the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Diego Moussallem submitted on 14/Jul/2021
Major Revision
Review Comment:

The paper investigates the Aramaic-Hebrew translation by relying on MT models for evaluating the quality of ancient translations and preserving Aramaic in a cultural heritage domain. The authors start stating that SMT surpassed the NMT model, but they do not briefly introduce neither the metrics nor the scores. in Section 2, the definition for MT is poor, "Machine translation (MT) is the computerized automatic translation from one natural language to another". Please rewrite it, line 13 Second Paragraph. The authors described all the components of an SMT system, but they omitted many parts of the NMT models. Therefore I suggest just do the same with SMT, there is no need to explain all single concepts of an SMT model. I expected some insights about the word alignment for ancient languages, but nothing is given. In the evaluation section, the authors simply relied on default parameters and did not investigate properly the translation between Aramaic-Hebrew in NMT. Instead, they stated "Since there is a huge gap between the SMT and the NMT BLEU scores, we did not make efforts to improve the NMT algorithms" and strongly affirmed that SMT outperforms NMT. They should have used other metrics and certainly investigated the parameters of RNN and Transformer models. Surprisingly, Transformer achieved a very low score.

Overall, the paper has its merits, it exploits an interesting problem and investigates ancient languages which are always fruitful. However, the paper requires significant re-writing, for example, the word "then" is used several times. Also, the authors need to improve the content presentation. I suggest the following:

1) Rewrite the abstract with direct and concise sentences.
2) Rewrite the introduction and avoid informal statements about MT. Also, the authors should state clearly what the challenges are for translating these languages with some examples, linguistically speaking
3) Decrease section 2, mainly the SMT part, it is not needed to explain SMT modules, although some insight is presented for handling the ancient language challenges.
4) In section 3, the train, dev, and test sets look strange, how did you split the data in terms of percentage? Have you performed correctly the experiments?
5) Section 4, please delve into the parameters of NMT models, also investigate BPE models for handling low-resource vocabs. Additionally, the authors should evaluate their systems with other metrics such as chrF, METEOR, and TER. Moreover, please divide the results into two subsections, automatic evaluation, and human evaluation.

Basically, these are the steps to make this paper stronger and ready for another round.


SMT outperformed MT -> SMT outperformed NMT
line 22, conforms -> confirms

Review #2
By Victor de Boer submitted on 24/Jul/2021
Review Comment:

This paper presents several experiments into the feasibility of Machine Translation from Aramaic to Hebrew. This work builds on previous work constructing a parallel Aramaic-Hebrew corpus.

The paper embeds the research well in current machine translation literature, and specifically other efforts translating ancient languages (although I am not an expert here). The paper describes construction of three parallel corpora, consisting of biblical and spiritual texts. Experiments with two types of methods (statistical translation (SMT) and Neural Network based method (NMT)) and show how for relevant metrics (BLUE), SMT outperforms the neural approach on the biblical texts. The paper spends significant time on an error analysis showing which parts of the original text are difficult to translate.

The work clearly presents interesting and valuable contributions in the (Digital) Humanities domain and the broader field of Machine Translation. It shows the effect of dialect, nature of the corpora on the effectivity of various methods. The paper is well-written and easy to follow. However two main points of criticisms are critical:

- The description of the translation method lacks detail. Especially for the NMT solution, the paper lacks discussion on which parameters are used and how the pipeline is set u. As far as I could determine, the paper does not point to reusable code or pipeline that would allow for (straigthforward) replication. Similarly, the availability of the corpus (and the evaluation results) is unclear. I could not find a pointer to a persistent repository in the paper. This makse reproducibility limited.

- More importantly, while this is clearly a digital humanities contribution, there is no usage of, discussion of or even mention of Semantic Web principles or technologies, Knowledge graphs, ontologies or similar. As such it would not fall within the scope of the journal issuing the special issue. I would argue that the paper deserves a better outlet in a Journal dedicated to natural language processing.

Review #3
Anonymous submitted on 28/Jul/2021
Minor Revision
Review Comment:

The title of the paper clearly illustrates the research topic, the scientific areas under investigation (Machine Translation, Historical Research), and the case study (Aramaic-Ancient Hebrew Translations).

The abstract is well written and well structured; it summarizes the objectives of the study, the methods adopted, the dataset, the main findings of the research, and suggestions for future research.

The statement of the problem, the main objectives and the contribution of this study to MT research are correct and adequately described in the Introduction. The research contributes to the advancement in the fields of corpus linguistics, parallel corpus construction and analysis, and statistical machine translation; it provides a fascinating case study, the construction of Aramaic-Hebrew parallel corpus based on the translation of the Bible, using the Corpus Encoding Standard, this corpus being an interesting case for cultural heritage recovery and preservation.

In the Background, the authors discuss exhaustively the scientific literature on Statistical Machine Translation (SMT) approaches and models provided so far, with a focus on Word alignment models (i.e. word-based translation models), such as IBM; Symmetrization, providing a deep explanation of Och and Ney’s methods, and of other models (phrase extraction from aligment data generated by IBM models, Zhang et al.’s algorithm, etc.); and Decoding algorithms. Although this study has a focus on SMT, the authors provide a wide overview and some suggested readings about Neural Machine Translation (NMT) approaches as well, since these show promising results in the field of MT. The choice for the SMT method in this study is correctly defended in the paper (p. 5): this choice is justified according to the case study characteristics and to the modular nature of SMT. The Section contains two other subsections, where the authors provide a documented overview of MT applied to ancient languages (2.3), and a complete review of Aramaic NLP studies, which is connected to one of the objectives of the study, being “a crucial step in preserving the Aramaic language and culture heritage” (p. 2). Thus, this extensive section collects and discuss critically the problems and solutions, methods and approaches adopted in MT research to date, according to the goals of the study.

Section 3 is focused on the Parallel corpus built by the authors using three Aramaic-Hebrew corpora: Targum Onkelos, Targum Jonathan, and Zohar. The authors explain suitably the characteristics of each corpus, providing interesting historical information about their composition, authorship, and impact. This is also useful for readers who are not familiar with this textual tradition. The Aramaic-Hebrew parallel Bible corpus construction is described in 3.1, where there is a quick mention to the encoding format adopted (CES) and to the level 1 annotation guidelines. Beyond being consistent with the corpus of Christodouloupoulos and Steedman (2015), it could be interesting to know whether and how this type of encoding is suitable for SMT in this specific case.

The Evaluation section discusses the main findings of the study: the assessment of the quality of the paralell corpus, the performances of the SMT algorithm trained on this corpus and on other similar Aramaic texts. The method adopted to train the SMT algorithms is correctly described; the authors explain accurately the division of dataset into three sets, the evaluation measure (BLEU), the MT algorithms adopted (statistical and neural), using open source toolkits applied on previous work. They also explain that the evaluation does not take into consideration intelligibility, word order or grammatical consistency; since the paper provides an interesting case for SMT approaches assessment, it would be interesting to know why is this so.

The evaluation results (Table 2) show that the SMT approach outperforms the NMT one, assuming that the BLEU score of the SMT method adopted is relatively high, thus confirming previous studies on the word-by-word translation techniques applied in the two ancient Aramaic translations considered (Targum Onkelus and Targum Jonathan). The authors analyze the translation errors of the SMT method adopted, using a random selection of 30 sentences; the results are classified into five categories, which are also discussed from a qualitative perspective and providing some interesting examples. Moreover, they evaluate the performance of the SMT trained model to the third parallel corpus of the Zohar, providing an accurate explanation of the lower BLEU score obtained in this case. This is further discussed in the paper, considering also the translation of the Talmud, which poses specific challenges to SMT.

The Conclusions are correct, the authors summarize the main findings and suggest new approaches for future research, such as using monolingual data or comparable corpus to improve the SMT performance and explore the translation quality of other ancient language pairs appliying this same methodology. Although, some possibile limitations of the study have been mentioned in the main sections, I suggest to recap them in the Conclusions too. Also, since the authors assert that NLP may be useful for preserving cultural heritage and endangered languages, it would be interesting to develop this idea a little more.
References are complete, relevant, and updated.

The structure of the article is correct, and the style is adequate. However, I suggest to review some typos and revise some stylistic aspects:
Page 2 line 2: word > work
Page 2 lines 14-18: The CES is simply mentioned here and in pp. 7-8, please provide some more info about this encoding standard here.
Page 2 line 32-33 The systems are then able to translate previously unseen sentences.
Page 3 lines 6-10: please reformulate; it is not clear what is the connection between these two sentences; "translation systems produce alignments between source and target sentences" ... "However, data available ... only contains sentence pairs".
Page 3 line 221: The one-to-many mapping; please briefly explain this concept
Page 3 line 40: world > word
Page 3 line 47: fertility model; please briefly explain this concept
Page 3 lines 49-51: revise syntax
Page 4 line 40: The cost of a new state is the cost of the original state multiplied with/by? the translation, distortion…
Page 5 line 12: BLEU points; There is a mention to the BLEU here, but you explain it in p. 8; it may be useful to have some indication about it here too
Page 5 line 18: theory; I'd rather say 'the assumption' or 'the theoretical assumption'
Page 5 line 39: While > Though,
Page 6 lines 22-23: the language > this language
Page 6 line 34: since > Since
Page 7 lines 1-8: … the central work of Jewish spiritual literature, also known as Kabbalah. The Zohar is a series of books with a commentary on the spiritual elements and scriptural interpretations of the Pentateuch, as well as mysticism, mythical cosmogony, and mystical psychology. The Zohar scriptural exegesis can be viewed as an esoteric…
Page 7 lines 37-43: please, revise syntax
Page 8 lines 32-33: was trained on our parallel corpus, and on other Aramaic text
Page 10 line 23: some sentence > some sentences

The paper can be published after minor changes