開放數據模型上下文協議
概覽
什麼是 OpenDataMCP?
OpenDataMCP 是一個創新的平台,旨在通過模型上下文協議將任何開放數據連接到任何大型語言模型(LLM)。這使得不同數據集的無縫集成和利用成為可能,增強了機器學習模型的能力,並促進了更好的數據驅動決策。
OpenDataMCP 的特點
- 多功能數據集成:OpenDataMCP 允許用戶將各種類型的開放數據源連接到 LLM,使得利用現有數據集進行高級分析和洞察變得更加容易。
- 模型上下文協議:這一獨特的協議確保數據被適當地上下文化,以提高 LLM 生成輸出的相關性和準確性。
- 用戶友好的界面:該平台以可用性為設計理念,提供直觀的界面,簡化數據連接和模型訓練的過程。
- 公共庫:OpenDataMCP 作為公共庫提供,鼓勵全球開發者和數據科學家的合作和貢獻。
如何使用 OpenDataMCP
- 訪問庫:訪問 OpenDataMCP GitHub 庫 探索可用的資源和文檔。
- 設置您的環境:按照庫中提供的設置說明配置您的開發環境以使用 OpenDataMCP。
- 連接您的數據:利用模型上下文協議將您的開放數據源與 LLM 連接。文檔中提供了詳細的指導。
- 訓練您的模型:一旦您的數據連接完成,您可以開始使用集成的數據集訓練您的 LLM,針對您的特定用例進行優化。
- 合作與分享:通過貢獻庫、分享您的發現以及在新功能或改進上進行合作來參與社區。
常見問題
問:可以將哪些類型的開放數據連接到 OpenDataMCP?
答:OpenDataMCP 支持各種開放數據格式和來源,包括 CSV、JSON 和來自各種公共數據集的 API。
問:OpenDataMCP 是免費使用的嗎?
答:是的,OpenDataMCP 是一個公共庫,免費使用。用戶可以無需訂閱費用即可訪問該平台。
問:我可以為 OpenDataMCP 做貢獻嗎?
答:當然可以!歡迎貢獻。您可以分支庫,進行改進,並提交拉取請求以與社區分享您的增強功能。
問:使用 OpenDataMCP 的系統要求是什麼?
答:系統要求可能根據您使用的特定 LLM 而有所不同。然而,標準的開發環境配備 Python 和必要的庫通常是足夠的。
問:我如何能夠隨時了解 OpenDataMCP 的最新動態?
答:您可以在 GitHub 上關注該庫並給它加星,以接收有關更新、新功能和社區討論的通知。
詳細
Open Data Model Context Protocol
See it in action
https://github.com/user-attachments/assets/760e1a16-add6-49a1-bf71-dfbb335e893e
We enable 2 things:
- Open Data Access: Access to many public datasets right from your LLM application (starting with Claude, more to come).
- Publishing: Get community help and a distribution network to distribute your Open Data. Get everyone to use it!
How do we do that?
- Access: Setup our MCP servers in your LLM application in 2 clicks via our CLI tool (starting with Claude, see Roadmap for next steps).
- Publish: Use provided templates and guidelines to quickly contribute and publish on Open Data MCP. Make your data easily discoverable!
Usage
<u>Access</u>: Access Open Data using Open Data MCP CLI Tool
Prerequisites
If you want to use Open Data MCP with Claude Desktop app client you need to install the Claude Desktop app.
You will also need uv
to easily run our CLI and MCP servers.
macOS
### you need to install uv through homebrew as using the install shell script
### will install it locally to your user which make it unavailable in the Claude Desktop app context.
brew install uv
Windows
### (UNTESTED)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Open Data MCP - CLI Tool
Overview
### show available commands
uvx odmcp
### show available providers
uvx odmcp list
### show info about a provider
uvx odmcp info $PROVIDER_NAME
### setup a provider's MCP server on your Claude Desktop app
uvx odmcp setup $PROVIDER_NAME
### remove a provider's MCP server from your Claude Desktop app
uvx odmcp remove $PROVIDER_NAME
Example
Quickstart for the Switzerland SBB (train company) provider:
### make sure claude is installed
uvx odmcp setup ch_sbb
Restart Claude and you should see a new hammer icon at the bottom right of the chat.
You can now ask questions to Claude about SBB train network disruption and it will answer based on data collected on data.sbb.ch
.
<u>Publish</u>: Contribute by building and publishing public datasets
Prerequisites
-
Install UV Package Manager
# macOS brew install uv # Windows (PowerShell) powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Linux/WSL curl -LsSf https://astral.sh/uv/install.sh | sh
-
Clone & Setup Repository
# Clone the repository git clone https://github.com/OpenDataMCP/OpenDataMCP.git cd OpenDataMCP # Create and activate virtual environment uv venv source .venv/bin/activate # Unix/macOS # or .venv\Scripts\activate # Windows # Install dependencies uv sync
-
Install Pre-commit Hooks
# Install pre-commit hooks for code quality pre-commit install
Publishing Instructions
-
Create a New Provider Module
- Each data source needs its own python module.
- Create a new Python module in
src/odmcp/providers/
. - Use a descriptive name following the pattern:
{country_code}_{organization}.py
(e.g.,ch_sbb.py
). - Start with our template file as your base.
-
Implement Required Components
- Define your Tools & Resources following the template structure
- Each Tool or Resource should have:
- Clear description of its purpose
- Well-defined input/output schemas using Pydantic models
- Proper error handling
- Documentation strings
-
Tool vs Resource
- Choose Tool implementation if your data needs:
- Active querying or computation
- Parameter-based filtering
- Complex transformations
- Choose Resource implementation if your data is:
- Static or rarely changing
- Small enough to be loaded into memory
- Simple file-based content
- Reference documentation or lookup tables
- Reference the MCP documentation for guidance
- Choose Tool implementation if your data needs:
-
Testing
- Add tests in the
tests/
directory - Follow existing test patterns (see other provider tests)
- Required test coverage:
- Basic functionality
- Edge cases
- Error handling
- Add tests in the
-
Validation
- Test your MCP server using our experimental client:
uv run src/odmcp/providers/client.py
- Verify all endpoints respond correctly
- Ensure error messages are helpful
- Check performance with typical query loads
- Test your MCP server using our experimental client:
For other examples, check our existing providers in the src/odmcp/providers/
directory.
Contributing
We have an ambitious roadmap and we want this project to scale with the community. The ultimate goal is to make the millions of datasets publicly available to all LLM applications.
For that we need your help!
Discord
We want to build a helping community around the challenge of bringing open data to LLM's. Join us on discord to start chatting: https://discord.gg/QPFFZWKW
Our Core Guidelines
Because of our target scale we want to keep things simple and pragmatic at first. Tackle issues with the community as they come along.
-
Simplicity and Maintainability
- Minimize abstractions to keep codebase simple and scalable
- Focus on clear, straightforward implementations
- Avoid unnecessary complexity
-
Standardization / Templates
- Follow provided templates and guidelines consistently
- Maintain uniform structure across providers
- Use common patterns for similar functionality
-
Dependencies
- Keep external dependencies to a minimum
- Prioritize single repository/package setup
- Carefully evaluate necessity of new dependencies
-
Code Quality
- Format code using ruff
- Maintain comprehensive test coverage with pytest
- Follow consistent code style
-
Type Safety
- Use Python type hints throughout
- Leverage Pydantic models for API request/response validation
- Ensure type safety in data handling
Tactical Topics (our current priorities)
- Initialize repository with guidelines, testing framework, and contribution workflow
- Implement CI/CD pipeline with automated PyPI releases
- Develop provider template and first reference implementation
- Integrate additional open datasets (actively seeking contributors)
- Establish clear guidelines for choosing between Resources and Tools
- Develop scalable repository architecture for long-term growth
- Expand MCP SDK parameter support (authentication, rate limiting, etc.)
- Implement additional MCP protocol features (prompts, resource templates)
- Add support for alternative transport protocols beyond stdio (SSE)
- Deploy hosted MCP servers for improved accessibility
Roadmap
Let’s build the open source infrastructure that will allow all LLMs to access all Open Data together!
Access:
- Make Open Data available to all LLM applications (beyond Claude)
- Make Open Data data sources searchable in a scalable way
- Make Open Data available through MCP remotely (SSE) with publicly sponsored infrastructure
Publish:
- Build the many Open Data MCP servers to make all the Open Data truly accessible (we need you!).
- On our side we are starting to build MCP servers for Switzerland ~12k open dataset!
- Make it even easier to build Open Data MCP servers
We are very early, and lack of dataset available is currently the bottleneck. Help yourself! Create your Open Data MCP server and get users to use it as well from their LLMs applications. Let’s connect LLMs to the millions of open datasets from governments, public entities, companies and NGOs!
As Anthropic's MCP evolves we will adapt and upgrade Open Data MCP.
Limitations
- All data served by Open Data MCP servers should be Open.
- Please oblige to the data licenses of the data providers.
- Our License must be quoted in commercial applications.
References
- Kudos to Anthropic's open source MCP release enabling initiative like this one.
License
This project is licensed under the MIT License - see the LICENSE file for details
伺服器配置
{
"mcpServers": {
"open-data-mcp": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"ghcr.io/metorial/mcp-container--opendatamcp--opendatamcp--open-data-mcp",
"odmcp"
],
"env": {}
}
}
}