Abstract:
The parameterized knowledge within Large Language Models (LLMs,) like GPT-4, presents a significant opportunity for knowledge extraction from text. However, LLMs’ context-sensitivity can hinder obtaining precise and task-aligned outcomes, thereby requiring prompt engineering. This study explores the efficacy of different prompt engineering methods for knowledge extraction, utilizing a relation extraction dataset in conjunction with a LLM (i.e. the RED-FM dataset and GPT-4). To address the challenge of evaluation, a novel evaluation framework grounded inWikidata ontology is proposed. The findings demonstrate that LLMs are capable of extracting a diverse array of facts from text. The research shows that the incorporation of a single example into the prompt can significantly improve the performance by two to threefold. The study further indicates that the inclusion of relevant examples within the prompts is more beneficial than adding either random or canonical examples. Beyond the initial example, the effect of adding more examples exhibits diminishing returns. Moreover, the performance of reasoning-oriented prompting methods do not surpass other tested methods. Retrieval-augmented prompts facilitate effective knowledge extraction from text with LLMs. Empirical evidence suggests that conceptualizing the extraction process as a reasoning exercise may not align with the task’s intrinsic nature or LLMs’ inner workings.