LangChain

LLM 数学（LLM Math） #

评估知道如何做数学的链。

# Comment this out if you are NOT using tracing
import os
os.environ["LANGCHAIN_HANDLER"] = "langchain"

from langchain.evaluation.loading import load_dataset
dataset = load_dataset("llm-math")

Downloading and preparing dataset json/LangChainDatasets--llm-math to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--llm-math-509b11d101165afa/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...

Dataset json downloaded and prepared to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--llm-math-509b11d101165afa/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.

设置链 Setting up a chain #

现在我们需要创建一些管道来做数学。

from langchain.llms import OpenAI
from langchain.chains import LLMMathChain

llm = OpenAI()

chain = LLMMathChain(llm=llm)

predictions = chain.apply(dataset)

numeric_output = [float(p['answer'].strip().strip("Answer: ")) for p in predictions]

correct = [example['answer'] == numeric_output[i] for i, example in enumerate(dataset)]

sum(correct) / len(correct)

1.0

for i, example in enumerate(dataset):
    print("input: ", example["question"])
    print("expected output :", example["answer"])
    print("prediction: ", numeric_output[i])

input:  5
expected output : 5.0
prediction:  5.0
input:  5 + 3
expected output : 8.0
prediction:  8.0
input:  2^3.171
expected output : 9.006708689094099
prediction:  9.006708689094099
input:    2 ^3.171 
expected output : 9.006708689094099
prediction:  9.006708689094099
input:  two to the power of three point one hundred seventy one
expected output : 9.006708689094099
prediction:  9.006708689094099
input:  five + three squared minus 1
expected output : 13.0
prediction:  13.0
input:  2097 times 27.31
expected output : 57269.07
prediction:  57269.07
input:  two thousand ninety seven times twenty seven point thirty one
expected output : 57269.07
prediction:  57269.07
input:  209758 / 2714
expected output : 77.28739867354459
prediction:  77.28739867354459
input:  209758.857 divided by 2714.31
expected output : 77.27888745205964
prediction:  77.27888745205964

通用代理评估 (Generic Agent Evaluation)OpenAPI 评估（OpenAPI Eval）