Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

VA Health Systems Research

Go to the VA ORD website
Go to the QUERI website

HSR Citation Abstract

Search | Search by Center | Search by Source | Keywords in Title

Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing.

Yao Z, Cao Y, Yang Z, Yu H. Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. 2023 Jun 16; 2023:592-601.

Dimensions for VA is a web-based tool available to VA staff that enables detailed searches of published research and research projects.

If you have VA-Intranet access, click here for more information vaww.hsrd.research.va.gov/dimensions/

VA staff not currently on the VA network can access Dimensions by registering for an account using their VA email address.
   Search Dimensions for VA for this citation
* Don't have VA-internal network access or a VA email address? Try searching the free-to-the-public version of Dimensions



Abstract:

Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs'' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduced context variance into the prompt generation and proposed a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we proposed the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, we showed that our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric make BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".





Questions about the HSR website? Email the Web Team

Any health information on this website is strictly for informational purposes and is not intended as medical advice. It should not be used to diagnose or treat any condition.