非結構化 API Mcp 伺服器
概覽
什麼是 UNS-MCP?
UNS-MCP 是一個由 Unstructured-IO 在 GitHub 上托管的公共庫。這個項目旨在促進非結構化數據的處理和管理,使開發者和數據科學家更容易處理各種數據格式。該庫旨在提供工具和資源,以簡化非結構化數據的提取、轉換和加載(ETL)過程。
UNS-MCP 的特點
- 數據處理:UNS-MCP 提供強大的非結構化數據處理功能,使用戶能夠將原始數據轉換為結構化格式。
- 整合能力:該庫支持與各種數據源的整合,實現從多個平台的無縫數據攝取。
- 用戶友好的介面:提供的工具以用戶體驗為中心設計,確保新手和經驗豐富的用戶都能有效導航和利用功能。
- 社區支持:作為一個公共庫,UNS-MCP 受益於開發者社區的貢獻和反饋,隨著時間的推移增強其功能和特性。
如何使用 UNS-MCP
- 克隆庫:首先使用 Git 將 UNS-MCP 庫克隆到本地機器。
git clone https://github.com/Unstructured-IO/UNS-MCP.git
- 安裝依賴:導航到項目目錄並根據文檔安裝必要的依賴。
cd UNS-MCP npm install
- 運行應用:啟動應用以開始處理您的非結構化數據。
npm start
- 探索功能:利用 UNS-MCP 提供的各種功能有效管理和處理您的數據。請參考文檔以獲取每個功能的詳細說明。
常見問題解答
Q1: UNS-MCP 可以處理哪些類型的非結構化數據?
A1: UNS-MCP 設計用於處理各種類型的非結構化數據,包括文本文件、PDF、圖像等。
Q2: 使用 UNS-MCP 需要付費嗎?
A2: 不,UNS-MCP 是一個公共庫,免費使用。然而,用戶可能需要考慮與托管和處理資源相關的費用。
Q3: 我該如何為 UNS-MCP 項目做貢獻?
A3: 歡迎貢獻!您可以分叉庫,進行更改,然後提交拉取請求以供審核。
Q4: 我可以在哪裡找到 UNS-MCP 的文檔?
A4: 文檔通常在庫內部可用,通常在 README 文件或專門的 docs 文件夾中。
Q5: 我可以報告 UNS-MCP 的問題或錯誤嗎?
A5: 可以,用戶可以通過導航到 GitHub 上庫的 "Issues" 部分來報告問題。
詳細
Unstructured API MCP Server
An MCP server implementation for interacting with the Unstructured API. This server provides tools to list sources and workflows.
Available Tools
| Tool | Description |
|-||
| list_sources
| Lists available sources from the Unstructured API. |
| get_source_info
| Get detailed information about a specific source connector. |
| create_source_connector
| Create a source connector.) |
| update_source_connector
| Update an existing source connector by params. |
| delete_source_connector
| Delete a source connector by source id. |
| list_destinations
| Lists available destinations from the Unstructured API. |
| get_destination_info
| Get detailed info about a specific destination connector |
| create_destination_connector
| Create a destination connector by params. |
| update_destination_connector
| Update an existing destination connector by destination id. |
| delete_destination_connector
| Delete a destination connector by destination id. |
| list_workflows
| Lists workflows from the Unstructured API. |
| get_workflow_info
| Get detailed information about a specific workflow. |
| create_workflow
| Create a new workflow with source, destination id, etc. |
| run_workflow
| Run a specific workflow with workflow id |
| update_workflow
| Update an existing workflow by params. |
| delete_workflow
| Delete a specific workflow by id. |
| list_jobs
| Lists jobs for a specific workflow from the Unstructured API. |
| get_job_info
| Get detailed information about a specific job by job id. |
| cancel_job
| Delete a specific job by id. |
| list_workflows_with_finished_jobs
| Lists all workflows that have any completed job, together with information about source and destination details. |
Below is a list of connectors the UNS-MCP
server currently supports, please see the full list of source connectors that Unstructured platform supports here and destination list here. We are planning on adding more!
Source | Destination |
---|---|
S3 | S3 |
Azure | Weaviate |
Google Drive | Pinecone |
OneDrive | AstraDB |
Salesforce | MongoDB |
Sharepoint | Neo4j |
Databricks Volumes | |
Databricks Volumes Delta Table |
To use the tool that creates/updates/deletes a connector, the credentials for that specific connector must be defined in your .env file. Below is the list of credentials
for the connectors we support:
Credential Name | Description |
---|---|
ANTHROPIC_API_KEY | required to run the minimal_client to interact with our server. |
AWS_KEY , AWS_SECRET | required to create S3 connector via uns-mcp server, see how in documentation and here |
WEAVIATE_CLOUD_API_KEY | required to create Weaviate vector db connector, see how in documentation |
FIRECRAWL_API_KEY | required to use Firecrawl tools in external/firecrawl.py , sign up on Firecrawl and get an API key. |
ASTRA_DB_APPLICATION_TOKEN , ASTRA_DB_API_ENDPOINT | required to create Astradb connector via uns-mcp server, see how in documentation |
AZURE_CONNECTION_STRING | required option 1 to create Azure connector via uns-mcp server, see how in documentation |
AZURE_ACCOUNT_NAME +AZURE_ACCOUNT_KEY | required option 2 to create Azure connector via uns-mcp server, see how in documentation |
AZURE_ACCOUNT_NAME +AZURE_SAS_TOKEN | required option 3 to create Azure connector via uns-mcp server, see how in documentation |
NEO4J_PASSWORD | required to create Neo4j connector via uns-mcp server, see how in documentation |
MONGO_DB_CONNECTION_STRING | required to create Mongodb connector via uns-mcp server, see how in documentation |
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY | a string value. The original server account key (follow documentation) is in json file, run base64 < /path/to/google_service_account_key.json in terminal to get the string value |
DATABRICKS_CLIENT_ID ,DATABRICKS_CLIENT_SECRET | required to create Databricks volume/delta table connector via uns-mcp server, see how in documentation and here |
ONEDRIVE_CLIENT_ID , ONEDRIVE_CLIENT_CRED ,ONEDRIVE_TENANT_ID | required to create One Drive connector via uns-mcp server, see how in documentation |
PINECONE_API_KEY | required to create Pinecone vector DB connector via uns-mcp server, see how in documentation |
SALESFORCE_CONSUMER_KEY ,SALESFORCE_PRIVATE_KEY | required to create salesforce source connector via uns-mcp server, see how in documentation |
SHAREPOINT_CLIENT_ID , SHAREPOINT_CLIENT_CRED ,SHAREPOINT_TENANT_ID | required to create One Drive connector via uns-mcp server, see how in documentation |
LOG_LEVEL | Used to set logging level for our minimal_client , e.g. set to ERROR to get everything |
CONFIRM_TOOL_USE | set to true so that minimal_client can confirm execution before each tool call |
DEBUG_API_REQUESTS | set to true so that uns_mcp/server.py can output request parameters for better debugging |
Firecrawl Source
Firecrawl is a web crawling API that provides two main capabilities in our MCP:
- HTML Content Retrieval: Using
invoke_firecrawl_crawlhtml
to start crawl jobs andcheck_crawlhtml_status
to monitor them - LLM-Optimized Text Generation: Using
invoke_firecrawl_llmtxt
to generate text andcheck_llmtxt_status
to retrieve results
How Firecrawl works:
Web Crawling Process:
- Starts with a specified URL and analyzes it to identify links
- Uses the sitemap if available; otherwise follows links found on the website
- Recursively traverses each link to discover all subpages
- Gathers content from every visited page, handling JavaScript rendering and rate limits
- Jobs can be cancelled with
cancel_crawlhtml_job
if needed - Use this if you require all the info extracted into raw HTML, Unstructured's workflow cleans it up really well :smile:
LLM Text Generation:
- After crawling, extracts clean, meaningful text content from the crawled pages
- Generates optimized text formats specifically formatted for large language models
- Results are automatically uploaded to the specified S3 location
- Note: LLM text generation jobs cannot be cancelled once started. The
cancel_llmtxt_job
function is provided for consistency but is not currently supported by the Firecrawl API.
Note: A FIRECRAWL_API_KEY
environment variable must be set to use these functions.
Installation & Configuration
This guide provides step-by-step instructions to set up and configure the UNS_MCP server using Python 3.12 and the uv
tool.
Prerequisites
- Python 3.12+
uv
for environment management- An API key from Unstructured. You can sign up and obtain your API key here.
Using uv
(Recommended)
No additional installation is required when using uvx
as it handles execution. However, if you prefer to install the package directly:
uv pip install uns_mcp
Configure Claude Desktop
For integration with Claude Desktop, add the following content to your claude_desktop_config.json
:
Note: The file is located in the ~/Library/Application Support/Claude/
directory.
Using uvx
Command:
{
"mcpServers": {
"UNS_MCP": {
"command": "uvx",
"args": ["uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "<your-key>"
}
}
}
}
Alternatively, Using Python Package:
{
"mcpServers": {
"UNS_MCP": {
"command": "python",
"args": ["-m", "uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "<your-key>"
}
}
}
}
Using Source Code
-
Clone the repository.
-
Install dependencies:
uv sync
-
Set your Unstructured API key as an environment variable. Create a .env file in the root directory with the following content:
UNSTRUCTURED_API_KEY="YOUR_KEY"
Refer to
.env.template
for the configurable environment variables.
You can now run the server using one of the following methods:
<details> <summary> Using Editable Package Installation </summary> Install as an editable package:uvx pip install -e .
Update your Claude Desktop config:
{
"mcpServers": {
"UNS_MCP": {
"command": "uvx",
"args": ["uns_mcp"]
}
}
}
Note: Remember to point to the uvx executable in environment where you installed the package
</details> <details> <summary> Using SSE Server Protocol </summary>Note: Not supported by Claude Desktop.
For SSE protocol, you can debug more easily by decoupling the client and server:
-
Start the server in one terminal:
uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # or make sse-server
-
Test the server using a local client in another terminal:
uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # or make sse-client
Note: To stop the services, use Ctrl+C
on the client first, then the server.
Configure Claude Desktop to use stdio:
{
"mcpServers": {
"UNS_MCP": {
"command": "ABSOLUTE/PATH/TO/.local/bin/uv",
"args": [
"--directory",
"ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
"run",
"server.py"
]
}
}
}
Alternatively, run the local client:
uv run python minimal_client/client.py uns_mcp/server.py
</details>
Additional Local Client Configuration
Configure the minimal client using environmental variables:
LOG_LEVEL="ERROR"
: Set to suppress debug outputs from the LLM, displaying clear messages for users.CONFIRM_TOOL_USE='false'
: Disable tool use confirmation before execution. Use with caution, especially during development, as LLM may execute expensive workflows or delete data.
Debugging tools
Anthropic provides MCP Inspector
tool to debug/test your MCP server. Run the following command to spin up a debugging UI. From there, you will be able to add environment variables (pointing to your local env) on the left pane. Include your personal API key there as env var. Go to tools
, you can test out the capabilities you add to the MCP server.
mcp dev uns_mcp/server.py
If you need to log request call parameters to UnstructuredClient
, set the environment variable DEBUG_API_REQUESTS=false
.
The logs are stored in a file with the format unstructured-client-{date}.log
, which can be examined to debug request call parameters to UnstructuredClient
functions.
Add terminal access to minimal client
We are going to use @wonderwhy-er/desktop-commander to add terminal access to the minimal client. It is built on the MCP Filesystem Server. Be careful, as the client (also LLM) now has access to private files.
Execute the following command to install the package:
npx @wonderwhy-er/desktop-commander setup
Then start client with extra parameter:
uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"
### or
make sse-client-terminal
Using subset of tools
If your client supports using only subset of tools here are the list of things you should be aware:
update_workflow
tool has to be loaded in the context together withcreate_workflow
tool, because it contains detailed description on how to create and configure custom node.
Known issues
update_workflow
- needs to have in context the configuration of the workflow it is updating either by providing it by the user or by callingget_workflow_info
tool, as this tool doesn't work aspatch
applier, it fully replaces the workflow config.
CHANGELOG.md
Any new developed features/fixes/enhancements will be added to CHANGELOG.md. 0.x.x-dev pre-release format is preferred before we bump to a stable version.
Troubleshooting
- If you encounter issues with
Error: spawn <command> ENOENT
it means<command>
is not installed or visible in your PATH:- Make sure to install it and add it to your PATH.
- or provide absolute path to the command in the
command
field of your config. So for example replacepython
with/opt/miniconda3/bin/python
伺服器配置
{
"mcpServers": {
"uns-mcp": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"ghcr.io/metorial/mcp-container--unstructured-io--uns-mcp--uns-mcp",
"uns_mcp"
],
"env": {
"UNSTRUCTURED_API_KEY": "unstructured-api-key"
}
}
}
}