Skip to content

接口说明

OCR 文档智能服务

shell
POST https://api.scnet.cn/api/llm/v1/ocrdoc/submit

1.功能介绍

​ Scnet OCR 文档智能服务,支持多种文档类型的异步 OCR 识别任务提交与结果查询,适用于大批量文档处理场景。

​ 核心流程: ​ 【任务提交】通过 submit 接口提交待处理的文件 URL,获取任务 ID; ​ 【状态查询】通过 result 接口轮询任务状态,获取处理结果; ​ 【结果获取】任务成功后返回文件下载地址,可下载识别结果文件。


2.任务提交 API

2.1 端点信息

项目内容
URLPOST /api/llm/v1/ocrdoc/submit
Content-Typeapplication/json
认证Authorization: Bearer <token>

2.2 请求参数

Header 参数
名称类型必填示例值
Content-Typestringapplication/json
AuthorizationstringBearer <API Key>
Body 参数
名称类型必填描述
file_urlstring待处理文件的 公网可访问 下载地址 (获取文件上传地址,请参考
ocr_typestring识别类别(目前只有DOC_PARING)

2.3 请求体示例

json
{
    "file_url": "https://oss.ksai.scnet.cn:58043/ocr/doc/2135155845..."
}

2.4 响应参数

参数名称参数类型描述
codeString状态码
msgString结果描述
outputObject任务提交结果
    task_statusString任务状态(pending 待执行、running 执行中、succeeded 成功、failed 失败、unknown 任务不存在或未知状态)
    task_idString任务唯一标识,用于后续结果查询
request_idString请求唯一标识

2.5 响应示例

成功响应
json
{
  "code": "200",
  "msg": "",
  "data": {
    "output": {
      "task_status": "pending",
      "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
  }
}
失败响应
json
{
  "request_id": "xxxx",
  "error": {
    "code": "404",
    "type": "model_not_found",
    "message": "Model xxx not found"
  }
}

3.任务状态查询 API

3.1 端点信息

项目内容
URLPOST /api/llm/v1/ocrdoc/result
认证Authorization: Bearer <token>

3.2 请求参数

Header 参数
名称类型必填示例值
AuthorizationstringBearer <API Key>
Body 参数
名称类型必填描述
task_idsarray任务ID 列表

2.3 请求体示例

json
{
  "task_ids": [
    "2056706028668284929","2056703208598626305"
  ]
}

3.3 响应参数

参数名称参数类型描述
codeString状态码
msgString结果描述
requestIdString请求唯一标识
outputObject任务结果
     taskIdString任务唯一标识
     taskStatusString任务状态
     submitTimeString任务提交时间
     endTimeString任务结束时间(成功/失败时返回)
     resultsArray识别结果文件下载地址列表(成功时返回)
     error_codeString错误码(失败时返回)
     error_messageString错误信息(失败时返回)

3.4 响应示例

任务成功
json
{
  "code": "0",
  "msg": "success",
  "data": [
    {
      "output": {
        "results": [
          "https://minio.fanhualuomu.top:8088/long-document-parsing/longDocumentParsing/results/2026/05/19/2056703208598626305/013_result_2056703208598626305.json?response-content-disposition=attachment%3B%20filename%3D%22013_result_2056703208598626305.json%22%3B%20filename%2A%3DUTF-8%27%27013_result_2056703208598626305.json&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=peihaojie%2F20260519%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260519T114740Z&X-Amz-Expires=43200&X-Amz-SignedHeaders=host&X-Amz-Signature=d414e3ff2f9074b036dac1855dad18a67792f3dcb2d2930d9f3605a4c90b111f"
        ],
        "task_id": "2056703208598626305",
        "task_status": "succeeded",
        "submit_time": "2026-05-19 19:47:11",
        "end_time": "2026-05-19 19:47:40"
      },
      "usage": {
        "image_count": 1
      },
      "request_id": "5e726f4f7d518259"
    },
    {
      "output": {
        "results": [
          "https://minio.fanhualuomu.top:8088/long-document-parsing/longDocumentParsing/results/2026/05/19/2056706028668284929/014_result_2056706028668284929.json?response-content-disposition=attachment%3B%20filename%3D%22014_result_2056706028668284929.json%22%3B%20filename%2A%3DUTF-8%27%27014_result_2056706028668284929.json&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=peihaojie%2F20260519%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260519T115855Z&X-Amz-Expires=43200&X-Amz-SignedHeaders=host&X-Amz-Signature=89707d665c259c91d00bedc289e4d20922b3b8936548d793d8b0e20b4e38e751"
        ],
        "task_id": "2056706028668284929",
        "task_status": "succeeded",
        "submit_time": "2026-05-19 19:58:24",
        "end_time": "2026-05-19 19:58:55"
      },
      "usage": {
        "image_count": 1
      },
      "request_id": "5e726f4f7d518259"
    }
  ]
}
任务进行中
json

{
  "code": "200",
  "msg": "",
  "data": [
    {
      "request_id": "8ae698ba-df2d-966c-abcf-xxxxxx",
      "output": {
        "task_id": "e56d806f-76f9-4037-aefa-xxxxxx",
        "task_status": "running",
        "submit_time": "2026-04-20 19:33:50.425"
      }
    }
  ]
}
任务失败
json
{
  "code": "200",
  "msg": "",
  "data": [
    {
      "request_id": "c61fe158-c0de-40f0-b4d9-964625119ba4",
      "output": {
        "task_id": "86ecf553-d340-4e21-xxxxxxxxx",
        "task_status": "failed",
        "submit_time": "2025-11-11 11:46:28.116",
        "end_time": "2025-11-11 11:46:28.255",
        "error_code": "limit_burst_rate",
        "error_message": "Burst rate limit exceeded for model xxx"
      }
    }
  ]
}

4.请求示例

4.1 任务提交 cURL 请求示例

shell
curl --location 'https://api.scnet.cn/api/llm/v1/ocrdoc/submit' \
--header 'Authorization: Bearer <API Key>' \
--header 'Content-Type: application/json' \
--data '{
    "file_url": "https://oss.ksai.scnet.cn:58043/ocr/doc/xxxxxx"
}'

4.2 任务状态查询 cURL 请求示例

shell
curl --location 'https://api.scnet.cn/api/llm/v1/ocrdoc/result' \
--header 'Authorization: Bearer <API Key>' \
--header 'Content-Type: application/json' \
--data '{
    "task_ids": ["2056706028668284929","2056703208598626305"]
}'

4.3 Python 请求示例

python
import requests
import time

API_KEY = "<API Key>"
BASE_URL = "https://api.scnet.cn/api/llm/v1/ocrdoc"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# 提交任务
submit_payload = {
    "fileUrl": "https://oss.ksai.scnet.cn:58043/ocr/doc/xxxxxx"
}
response = requests.post(f"{BASE_URL}/submit", json=submit_payload, headers=headers)
submit_result = response.json()
print("提交结果:", submit_result)

task_id = submit_result["output"]["taskId"]

# 轮询查询任务状态
while True:
    result_response = requests.get(
        f"{BASE_URL}/result",
        params={"taskId": task_id},
        headers=headers
    )
    result = result_response.json()
    task_status = result["output"]["taskStatus"]

    if task_status == "SUCCEEDED":
        print("任务成功:", result)
        break
    elif task_status == "FAILED":
        print("任务失败:", result)
        break
    else:
        estimated = result["output"].get("estimatedSeconds", 10)
        print(f"任务处理中,预计等待 {estimated} 秒...")
        time.sleep(min(estimated, 10))

4.4 Go 请求示例

go
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"time"
)

func main() {
	baseURL := "https://api.scnet.cn/api/llm/v1/ocrdoc"
	bearerToken := "<API Key>"

	// 提交任务
	submitBody := map[string]string{
		"fileUrl": "https://oss.ksai.scnet.cn:58043/ocr/doc/xxxxxx",
	}
	bodyBytes, _ := json.Marshal(submitBody)

	req, _ := http.NewRequest("POST", baseURL+"/submit", bytes.NewBuffer(bodyBytes))
	req.Header.Set("Authorization", "Bearer "+bearerToken)
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	resp, _ := client.Do(req)
	respBody, _ := io.ReadAll(resp.Body)
	resp.Body.Close()

	var submitResult map[string]interface{}
	json.Unmarshal(respBody, &submitResult)
	fmt.Printf("提交结果: %s\n", string(respBody))

	taskId := submitResult["output"].(map[string]interface{})["taskId"].(string)

	// 轮询查询任务状态
	for {
		req, _ = http.NewRequest("GET", baseURL+"/result?taskId="+taskId, nil)
		req.Header.Set("Authorization", "Bearer "+bearerToken)

		resp, _ = client.Do(req)
		respBody, _ = io.ReadAll(resp.Body)
		resp.Body.Close()

		var result map[string]interface{}
		json.Unmarshal(respBody, &result)
		output := result["output"].(map[string]interface{})
		taskStatus := output["taskStatus"].(string)

		if taskStatus == "SUCCEEDED" {
			fmt.Printf("任务成功: %s\n", string(respBody))
			break
		} else if taskStatus == "FAILED" {
			fmt.Printf("任务失败: %s\n", string(respBody))
			break
		} else {
			fmt.Println("任务处理中...")
			time.Sleep(10 * time.Second)
		}
	}
}

4.5 Node.js 请求示例

javascript
const API_KEY = '<API Key>';
const BASE_URL = 'https://api.scnet.cn/api/llm/v1/ocrdoc';

async function submitTask() {
    const response = await fetch(`${BASE_URL}/submit`, {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            fileUrl: 'https://oss.ksai.scnet.cn:58043/ocr/doc/xxxxxx'
        })
    });

    if (!response.ok) {
        throw new Error(`提交任务失败 [状态码: ${response.status}]:${await response.text()}`);
    }

    const submitResult = await response.json();
    console.log('提交结果:', submitResult);
    return submitResult.output.taskId;
}

async function pollResult(taskId) {
    while (true) {
        const response = await fetch(`${BASE_URL}/result?taskId=${taskId}`, {
            method: 'GET',
            headers: {
                'Authorization': `Bearer ${API_KEY}`
            }
        });

        if (!response.ok) {
            throw new Error(`查询任务失败 [状态码: ${response.status}]:${await response.text()}`);
        }

        const result = await response.json();
        const taskStatus = result.output.taskStatus;

        if (taskStatus === 'SUCCEEDED') {
            console.log('任务成功:', result);
            return result;
        } else if (taskStatus === 'FAILED') {
            console.log('任务失败:', result);
            return result;
        } else {
            const estimated = result.output.estimatedSeconds || 10;
            console.log(`任务处理中,预计等待 ${estimated} 秒...`);
            await new Promise(resolve => setTimeout(resolve, Math.min(estimated, 10) * 1000));
        }
    }
}

async function main() {
    try {
        const taskId = await submitTask();
        await pollResult(taskId);
    } catch (error) {
        console.error('OCR 文档智能服务调用失败:', error.message);
    }
}

main();

5.任务状态说明

状态描述
pending任务已提交,等待处理
running任务处理中
succeeded任务处理成功
failed任务处理失败
unknown任务不存在或未知状态

6.错误码说明

错误码描述
unknown_errorUnknown error
modal_type_not_supportedUnsupported modal type xxx
provider_not_supportedUnsupported provider xxx
model_not_supportedUnsupported model xxx
model_not_foundModel xxx not found
request_concurrency_conflictConcurrency conflict for request, please try again later
provider_errorProvider xxx process error
model_route_failedModel xxx route failed
content_illegalIllegal content detected by content approval
limit_burst_rateBurst rate limit exceeded for model xxx
task_not_foundTask not found
InvalidParameterParameter illegal
SystemErrorAn system error has occurred, please try again later