Head of AWS Automated Reasoning: Instead of solving AI illusions, it's better to prove it's right; Rust Borrow checker is essentially an inference engine-News-Artificial Intelligence Global Cooperation Alliance

Head of AWS Automated Reasoning: Instead of solving AI illusions, it's better to prove it's right; Rust Borrow checker is essentially an inference engine

#News ·2025-01-08

Edited by | Yan Zheng

A significant flaw of AI is that it will unknowingly "hallucinate," making up reasonable answers that are not based on real data.

AWS is trying to solve this problem, and a good way to do so is by introducing Amazon Bedrock Automated Inference Check.

Amazon Bedrock is a managed service for generative AI applications.

Speaking at the re:Invent conference in Las Vegas last month, AWS CEO Matt Garman said the checks "prevent factual errors due to model hallucinations...... Bedrock can check whether the factual statements made by the model are accurate.

It's all based on "sound mathematical validation," he says. How to understand this sentence? What's behind them?

AWS 首席执行官 Matt Garman 介绍了 Bedrock 的自动推理 AWS CEO Matt Garman introduced Bedrock's automated reasoning

Byron Cook, head of AWS's Automated Reasoning team, recently revealed more detailed thinking in an interview.

1. Rather than solving the AI illusion, it's better to prove it's true

"I've been working in the field of formal reasoning and tools. I've been bringing this capability to Amazon since about 10 years ago, and then AI has some applications. Now all of a sudden, the area I was in, which had been very obscure before, was suddenly not obscure.

How to mitigate the risks posed by AI hallucinations, and is the problem solvable?

"In a sense, illusion is a good thing because it's creativity. But during language model generation, some of these results will be incorrect, "he said.

"But according to whose definition is it wrong? It turns out that defining what truth is is surprisingly difficult. Even in areas where you think everyone should agree."

"I've worked in aerospace, railroad shunting, operating systems, hardware, biology, and in all of these fields, what I've seen is that in building these kinds of tools, most of the time is spent arguing among domain experts about what the right answer should be, and those examples are driven by specific examples of what comes up and fights against extremes."

Cook added: "The other thing is that there are some issues that can't be decided. Turing, for example, had shown this. No program can answer questions consistently, authoritatively, and with 100 percent accuracy in a limited amount of time."

If you try to block all the stated areas, some are relatively formal and some are not. What makes good music will be hard to formalize, and people may have some theories about it, but they may disagree among themselves.

Other fields, just like biology, have models of how biological systems work, but part of what they do is take those models and then examine real systems. They are trying to improve the model, so the model may be wrong. With these warnings, you can do a lot.

AWS 自动推理小组负责人 Byron Cook Byron Cook, head of AWS Automated Inference Group

Cook introduced the Automated Reasoning tool and cited example cases, such as determining the correct tax code based on an individual's income statement.

The tool, he says, "takes statements in natural language and translates them into logic, and then proves or disproves validity under the field."

"How can models go wrong" through tools such as: translation from natural language to logic can go wrong, and people deciding what tax law is and formalizing it can go wrong. So it's still possible that we get the wrong answer, but under the assumption that we translate correctly, under the assumption that we help the client formally define [the rule], we can construct a proven argument in mathematical logic that the answer they got is the right one.

Hallucinations, Cook said, "are a problem we have to live with for a long time." After all, humans can hallucinate...... As a society, we are always gradually working out what truth is, how we define it, and who decides what it is.

Cook also commented on a famous case of AI illusion, with the lawyer citing the case of OpenAI's ChatGPT invention. It's not exactly the kind of illusion that automated reasoning tools can solve, Cook says. "We can build a database of all known [legal case] outcomes and formalize it," he said. "I'm not sure it's the best app."

picture

2. Not suitable for programming, but good for developer defensive programming

The question for developers is: Can this automated inference tool help them check whether the generated algorithm code is correct?

"This product is not designed for programmers," Cook said. "But it hasn't escaped our attention. We're actually doing reasoning about code all the time...... For 25 years, I've been trying to prove the program right. This is the domain of giant companies with heavy assets, because it is very challenging to do so. But generative AI seems poised to significantly lower this barrier to entry, helping developers formalize what program they want to prove. This is very exciting, but it does not include "automatic inference" products.

Cook's team has also tackled other issues at Amazon, such as proving that access control policies work as expected, as well as similar encryption, networking, storage, and virtualization. It turns out that "proving code is mathematically correct" has a nice side effect, one of which is that the code is more efficient.

"When you have an automated reasoning tool to check your homework, you can optimize more aggressively." When developers don't have that ability, what they're doing is pretty conservative, defensive coding if you will. Using these tools, they can perform optimizations that are very scary to them. We give them a lot of security.

3.Rust's borrow checker is essentially an inference engine

Rust, he adds, is a natural fit for provable programming. "When you program in Rust, you're actually using a theorem prover. A lot of people don't realize that programmers actually set out to "do memory-safe proofs," and that the borrow checker in Rust is essentially a deductive theorem prover. It's an inference engine. Developers are guiding the tool through this process.

Rust can be faster than C because it&apos;s able to do clever things with memory that they can't do in C, and certainly can't do in Java or other languages because they've let programmers do correctness proofs.

"So Rust is a very clever integration of automatic inference techniques, type systems, compilers, and then they have very good error messages that make the tool very useful." As a result, we've seen some types of programs migrate to Rust with good results.

To learn more about AIGC, please visit:

51CTO AI.x community

https://www.51cto.com/aigc/

TAGS：

PREV： Google forms new AI team to develop 'World Model' : Critical Path to AGI

RETURN

NEXT： Microsoft CEO Opening Speech: Pre-training Scaling Law is still great! The first NVIDIA GB200 Cluster is online! In the future, hand-rolling agents will be as easy as building Excel