ما هي مشكلة الهلوسة في نماذج الذكاء الاصطناعي؟

الهلوسة هي عندما ينتج النموذج معلومات تبدو مقنعة لكنها غير صحيحة أو غير مبنية على البيانات المدخلة، مما يقلل الموثوقية.

كيف يقيس مقياس الأمانة الدلالية (SF) الدقة؟

يقيس SF مدى التزام الإجابة بالسياق من خلال حساب التباعد الإحصائي بين تمثيل هدف السؤال وتمثيل الإجابة الفعلية على المواضيع المشتركة.

ما فائدة المقياسين SF و SEP للمطورين والمستخدمين؟

تمكنهم من تقييم واختيار النماذج الأكثر دقة وأمانة، والتحكم في جودة المخرجات، خاصة في المجالات الدقيقة مثل التحليل المالي أو الطبي.

New AI Metrics Reduce Hallucination & Boost Accuracy | AI Tools Oasis

New Metrics to Combat AI Hallucination

In a significant development to address the challenges of hallucination in Large Language Models (LLMs), researchers have presented a new scientific paper on arXiv proposing an innovative framework for evaluating semantic fidelity and model faithfulness to the assigned task. The new metrics are based on concepts from information theory and thermodynamics, offering objective tools to measure the extent to which a model adheres to the provided context without fabrication or distortion.

How Do the Proposed Metrics Work?

The proposed framework treats a Large Language Model as a binary information engine, where the hidden layers act as a "Maxwell's demon" controlling the transformation of context (C) into an answer (A) via a prompt (Q). Question-Context-Answer triples (QCA) are modeled as probability distributions over shared topics. The topic transitions from context to question and answer are represented by two transition matrices (Q and A) encoding the query intent and the actual outcome, respectively.

The Semantic Fidelity (SF) metric measures the faithfulness of any QCA triple through the Kullback-Leibler (KL) divergence between these two matrices. The two matrices are inferred simultaneously via convex optimization of this divergence, and the final metric value is obtained by mapping the minimum divergence to the unit interval [0,1], where higher scores indicate greater faithfulness.

Additionally, the researchers propose a secondary, thermodynamics-based metric called Semantic Entropy Production (SEP) in answer generation, showing that high faithfulness generally implies low entropy production. The SF and SEP metrics can be used together or separately to evaluate LLMs and control hallucination.

Practical Application and Promising Results

The effectiveness of the proposed framework was demonstrated by applying it to the task of summarizing corporate financial reports (SEC 10-K filings), showcasing its ability to distinguish accurate responses from those suffering from hallucination or deviation from context. These metrics open the door to developing more reliable and transparent language models, especially in sensitive applications requiring high accuracy and strict adherence to source information.

Source: arXiv AI Papers | Exclusive coverage from AI Tools Oasis

New Metrics to Control AI Hallucination: Reducing Fabrication and Increasing Semantic Fidelity

New Metrics to Combat AI Hallucination

How Do the Proposed Metrics Work?

Practical Application and Promising Results

AI Tools Oasis Team

Related News

OpenAI Super App Development Continues: What's New?

Notion Restores Anthropic AI Integration After 4-Hour Outage

Tokenpocalypse Warning: Is the Crypto Market Heading for a Collapse?