OpenAI 解说 LOL S13 职业联赛-技术圈

随着 OpenAI 首届开发者大会的召开，OpenAI 现在为开发者提供和更新了以下功能：

Vision：GPT-4 可以理解图片了；
Text to speech：文字转语音能力。

再结合之前 GPT 的 Completion 对话能力，我们就可以打造一个视频解说功能。

这次我们以英雄联盟职业联赛 S13 四分之一决赛 T1 VS LNG Game1 为例，摘取其中的 30s 片段，使用 GPT 为其附上视频解说。我们先看成品，原视频时长为 30s，生成的解说语音为 128s，所以这里合成时加速解说语音为 1.2 倍，并慢放了视频。

解说效果虽然和专业比赛解说还有很大差距，但是最终效果来看 GPT 的解说大部分还是可靠的。接下来我们看看实现此功能需要哪些代码片段。

0. 准备工作

编程语言使用 Python，需要提前安装好 OpenAI、OpenCV 相关库：

pip install openai
pip install opencv-python

1. 读取视频帧

这里主要使用 OpenCV 库，读取视频帧保存起来。

video = cv2.VideoCapture("data/lol.mp4")

base64Frames = []

while video.isOpened():
	success, frame = video.read()
	if not success:
		break
	_, buffer = cv2.imencode(".jpg", frame)

	base64Frames.append(base64.b64encode(buffer).decode("utf-8"))

video.release()

2. 生成解说脚本

这里使用 OpenAI Completion 的对话能力，将 Prompt 和模型作为参数传入。

模型使用 gpt-4-vision-preview，这是基于 GPT-4 的图片识别模型。

Prompt 中我们传入上一步提取到图片，这里没有必要传入所有视频帧，这样不仅可以规避 OpenAI API 调用频率限制，也可以减少 Token 的使用。这里采用 1s 提取 1 帧的方式，也就是每隔 60 帧提取一帧，对应代码 base64Frames[0::60]。

除此之外，我们传入对比赛视频的简单介绍，包括队伍和其中的选手，以及从哪些方面解说的内容，以便让 GPT 的识别更加准确。

from openai import OpenAI

os.environ["OPENAI_API_KEY"] = "your openai key"
client = OpenAI()

PROMPT_MESSAGES = [
    {
        "role": "user",
        "content": [
            # "这是英雄联盟游戏的直播比赛视频，每个图片间隔1s，这是两支队伍T1和LNG的比赛。T1上单宙斯，打野oner，中单faker，下路gumayusi，辅助keria。LNG上单zika，打野tarzan，中单scout，下路gala，辅助hang。tarzan在这波团战中表现很差。作为游戏解说，写出游戏直播脚本，描述游戏正在进行的事情，选手正在干什么，谁开启团战以及技能释放情况，以及队伍优劣势的分析",
            "This is a live broadcast video of a League of Legends game, with each image spaced 1 second apart. It features a match between two teams, T1 and LNG. For T1, Zeus is in the top lane, oner as the jungler, Faker in the mid lane, Gumayusi in the bot lane, and Keria as the support. For LNG, Zika is in the top lane, Tarzan as the jungler, Scout in the mid lane, Gala in the bot lane, and Hang as the support. Tarzan's performance in this team fight was poor. As a game commentator, write a live broadcast script describing what is happening in the game, what the players are doing, who initiates the team fight and the skill release situation, as well as an analysis of the team's advantages and disadvantages.",
            *map(lambda x: {"image": x, "resize": 768}, base64Frames[0::60]),
        ],
    },
]
params = {
    "model": "gpt-4-vision-preview",
    "messages": PROMPT_MESSAGES,
    "max_tokens": 500,
}

result = client.chat.completions.create(**params)
print(result.choices[0].message.content)

3. 生成音频

上一步生成的游戏解说英文如下：

Welcome back to an intense match between T1 and LNG, and the tension on the Rift is palpable! Both teams are neck and neck with the gold at 12.8k, but wait—T1 is making a bold move at the Rift Herald.
Zeus is holding steady in the top lane, and it looks like they're trusting oner to secure the objective with Faker supporting just a few paces back. Gumayusi and Keria are not yet at the scene, but they could join swiftly if things get heated.
Hold on—Tarzan is looking to contest! But there seems to be hesitation; the rest of LNG are not fully positioned to back him up. Zika, Scout, Gala, and Hang are scattered, and Tarzan needs to be careful not to overcommit.
And there it is, T1 initiates the skirmish on their terms, collaring Tarzan who seems to be caught out. T1 pounces with impeccable timing and skill coordination. Faker, living up to the legend, orchestrates a masterful play and it looks like—yes—LNG's Tarzan is down!
This is a decisive moment for T1; they've secured the advantage, racking up not just the kill but also gaining control over the Herald. LNG must now regroup and reassess their positioning and communication. Tarzan's performance in that engage was indeed suboptimal, possibly due to miscommunication or a misread on the enemy's positioning.
T1 is showing the power of teamwork and presence on the map. As the dust settles, T1 emerges with not just the Rift Herald but a clear message: they are in it to win it, and any slip-up from LNG will be exploited to its fullest!
Stay tuned as we continue to break down this match and see if LNG can bounce back from this unfavorable exchange. It's all about the macro play, vision control, and those split-second decision-making skills that separate the good from the great in League of Legends.

其中文翻译如下：

欢迎回到T1与LNG的激烈对决中，比赛的紧张气氛在召唤师峡谷中弥漫开来！两队在黄金收入上势均力敌，均为12.8千，但等等——T1正在大胆地争夺裂谷先锋。
Zeus稳扎稳打地控制着上路，看起来他们信任oner来确保目标的安全，同时Faker在几步之遥的地方提供支援。Gumayusi和Keria还没有赶到现场，但如果形势升温，他们可能会迅速加入。
等一下——Tarzan正试图争夺！但他似乎犹豫不决；LNG的其他成员还没有完全就位来支援他。Zika、Scout、Gala和Hang分散开来，Tarzan需要小心，不要过度投入。
就在这时，T1发起了战斗，以他们的方式牵制Tarzan，后者似乎被困住了。T1凭借无可挑剔的时机和技能协调迅速反击。Faker名不虚传，精心策划了一次绝妙的行动——是的——LNG的Tarzan倒下了！
这对T1来说是一个决定性的时刻；他们不仅拿下了击杀，还控制了裂谷先锋。LNG现在必须重新集结，并重新评估他们的定位和沟通。Tarzan在那次交锋中的表现确实不佳，可能是由于沟通不畅或误判了敌人的位置。
T1展示了团队合作和地图上的存在感的力量。随着尘埃落定，T1不仅获得了裂谷先锋，还清晰地传递出一条信息：他们是为了胜利而来的，任何LNG的失误都将被充分利用！
请继续关注，我们将继续分析这场比赛，看看LNG是否能从这次不利的交换中反弹。这一切都关乎宏观游戏、视野控制，以及那些将优秀选手与伟大选手区分开来的瞬息决策技巧。

可以看出，除了标黄部分的错误判断外，其他判断是正确的。当然，对每个选手操作的解说过于缺乏也是和专业解说有差距的地方。

我们将以上文本输入到以下程序中，生成对应的语音。因为 OpenAI tts 对英文支持较好，生成的中文语言比较缺乏情感，这里采用英文文本。

from pathlib import Path
from openai import OpenAI

client = OpenAI()

speech_file_path = "./nova.mp3"
response = client.audio.speech.create(
	model="tts-1",
	voice="nova",
	input=text
)

response.stream_to_file(speech_file_path)

最后，将解说语音和原视频合成，就得到了之前的成品视频了。

这里挑战的是长度为 30s 游戏比赛视频，从结果来看，10 分制的话，我会给他打到 7 分，及格且稍微超出预期。

对于单个图片的内容识别，对于 OpenAI 来说更是不在话下。我们拍个照片就可以了解照片中的所有事物，如果和 AR 结合起来，那就是科幻小说中的场景了，继续期待 AI 的下一步发展。

OpenAI 解说 LOL S13 职业联赛

0. 准备工作

1. 读取视频帧

2. 生成解说脚本

3. 生成音频

添加附言

相关文章推荐