从 MinerU 换证说起:常见开源协议的区别和坑
我最近在搓一个自动化知识库处理工具,用来处理飞书收集群里的增量消息。起因很简单:平时习惯睡前刷技术博客,顺手把链接丢进微信或飞书——但忙起来根本没空整理;偶尔读完会随手记几句想法,一旦拖着就全忘了。所以就想做个工具,自动帮我汇总内容、归纳知识、顺带把那些随手写下来的东西也存住。 有些时候这些内容会包含 PDF 文件,需要能自动提取里面的内容。调研之后,发现 MinerU 比较合适,可以从 PDF 里自动提取文本。我最开始是在 B 站上刷到这个工具的,然后从评论区注意到有位用户提到它用的是传染性开源协议——也就是用了这个项目,就必须开源自己的代码。不过我发现最新的项目 README 里写着: 2026/04/18 3.1.0 Released This release focuses on licensing openness, parsing accuracy, and full-format native support. The main updates include: License upgrade MinerU has officially moved from AGPLv3 to the MinerU Open Source License, a custom license based on Apache 2.0. This change significantly reduces adoption friction for both community users and commercial deployments, making MinerU easier to integrate into real-world workflows. ...