<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>OpenSource on Aries441's Tech Blog</title><link>https://aries441.tech/tags/opensource/</link><description>Recent content in OpenSource on Aries441's Tech Blog</description><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Sat, 02 May 2026 00:00:00 +0800</lastBuildDate><atom:link href="https://aries441.tech/tags/opensource/index.xml" rel="self" type="application/rss+xml"/><item><title>从 MinerU 换证说起：常见开源协议的区别和坑</title><link>https://aries441.tech/posts/tech-notes/open-source-license/</link><pubDate>Sat, 02 May 2026 00:00:00 +0800</pubDate><guid>https://aries441.tech/posts/tech-notes/open-source-license/</guid><description>&lt;p&gt;我最近在搓一个自动化知识库处理工具，用来处理飞书收集群里的增量消息。起因很简单：平时习惯睡前刷技术博客，顺手把链接丢进微信或飞书——但忙起来根本没空整理；偶尔读完会随手记几句想法，一旦拖着就全忘了。所以就想做个工具，自动帮我汇总内容、归纳知识、顺带把那些随手写下来的东西也存住。&lt;/p&gt;
&lt;p&gt;有些时候这些内容会包含 PDF 文件，需要能自动提取里面的内容。调研之后，发现 &lt;a href="https://github.com/opendatalab/MinerU"&gt;MinerU&lt;/a&gt; 比较合适，可以从 PDF 里自动提取文本。我最开始是在 B 站上刷到这个工具的，然后从评论区注意到有位用户提到它用的是传染性开源协议——也就是用了这个项目，就必须开源自己的代码。不过我发现最新的项目 README 里写着：&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;2026/04/18 3.1.0 Released&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This release focuses on licensing openness, parsing accuracy, and full-format native support. The main updates include:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;License upgrade&lt;/strong&gt;
MinerU has officially moved from AGPLv3 to the MinerU Open Source License, a custom license based on Apache 2.0.
This change significantly reduces adoption friction for both community users and commercial deployments, making MinerU easier to integrate into real-world workflows.&lt;/p&gt;</description></item></channel></rss>