Public Information
Base Parameters
| Parameter | Description |
|---|---|
| BasePath | the base path for API calls |
| Endpoint | the random path generated when calling APIs |
| AccessKey | the AccessKey for API calls |
Request Path
https://{basePath}/search/{endpoint}/read
Request Method
GET
Request Header
| Parameter | Type | Description |
|---|---|---|
| Authorization | String | Input format: Bearer {AK}, where AK refers to the AccessKey |
| Pragma | String | Input value: no-cache. When no-cache is provided, the response results will not be cached and each request is independent; if not provided, the response results for the same query term will be cached for 10 minutes |
Request Body Parameters
| Parameter | Required | Type | Description |
|---|---|---|---|
| url | Y | String | Target URL to read |
| formats | N | List |
Content format. Options:: - HTML - TEXT - MARKDOWN Must select one value; defaults to TEXT if not provided |
| mode | N | String | Reading mode:: - quality:Quality mode, uses dynamic rendering - fast:Fast mode, uses static page reading (default if not provided) - auto:Auto mode, automatically selects fast or quality based on the URL - Non-enumerated values default to fast |
| totalTimeout | N | Int | Disabled by default。End-to-end request timeout in milliseconds. The request will be aborted if it exceeds this time. |
| timeout | N | Int | Read timeout in milliseconds. Default: 30000ms Controls the API connection time, not the total request duration. |
| imageDownloadEnable | N | Bool | Image conversion switch (converts to base64 encoding).Default false When formats is HTML/MARKDOWN and imageDownloadEnable is true, image links in the response are converted to base64.(Note: Enabling this will incur additional fees for images parsed from the webpage.) Note: If the URL points to a PDF, this feature is disabled and the result is returned in base64 by default. |
| imageInContent | N | Bool | Works with imageDownloadEnable. Default: true; images converted to base64 are embedded in the main content. If set to false, an additional parameter image_base64_list is returned, containing all base64 images in a separate list. |
| pdfExtractEnable | N | Bool | When the URL points to a PDF, use this to enable/disable content extraction. Default: false true: Return parsed PDF text content false: Return PDF as base64 Has no effect if the URL does not point to a PDF. Note: PDF content extraction incurs additional fees.。 |
| enhancedOcr | N | Bool | Only effective if pdfExtractEnable is true Enhanced PDF extraction, improves accuracy and coverage. Default: False Note: OCR recognition incurs additional fees. |
Response Value Description
| Field | Type | Description |
|---|---|---|
| html | String | When format is html, returns the HTML version of the page content. |
| markdown | String | When format is markdown, returns the Markdown version of the content. |
| text | String | When format is text, returns the plain text version of the content. |
| metadata.title | String | Webpage title. |
| metadata.description | String | Webpage description. |
| metadata.keywords | String | Webpage keywords. |
| logo | String | URL of the website logo. |
| site_name | String | Website name. |
| image_list | Array | List of image URLs. Only included if the page contains images. |
| image_base64_list | Array | List of images in base64 encoding. |
| pdf_pages | Int | Number of pages if the target is a PDF. |
| internal_links | Array | List of internal links (same domain as the target URL) found in the content. |
| external_links | Array | List of external links (different domain from the target URL) found in the content. |
| external_links | Array | List of external links found in the target URL content. |
Request Example
cURL
curl --location --request POST 'https://{BasePath}/search/{Endpoint}/read' \
--header 'Authorization: Bearer {AK}' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://www.volcengine.com/docs/6369/67267",
"formats": [
"TEXT"
],
"mode":"quality"
}'
Python(Requests)
import requests
import json
BasePath = 'xxxxxx' # Replace with your obtained BasePath
Endpoint = 'xxxxxx' # Replace with your obtained Endpoint
AK = 'xxxxxx' # Replace with your obtained AccessKey
url = f"https://{BasePath}/search/{Endpoint}/read"
payload = json.dumps({
"url": "https://www.volcengine.com/docs/6369/67267",
"formats":[
"TEXT"
],
"mode": "quality"
})
headers = {
'Authorization': f'Bearer {AK}',
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=payload)
print(response.text)
Response Example
{
"markdown": "导航\n\n请求结构\n\n最近更新时间:2025.04.23 12:06:55首次发布时间:2021.02.25 21:02:47\n\n火山引擎的OpenAPI的请求结构如下:\n\n服务地址(Endpoint)用于访问火山引擎的云服务,通常是一个URL,客户端可以通过该地址与服务进行通信。 \n火山引擎服务地址的标准形式为:\n\nRegional服务 (区域化部署) Global服务(中心化部署)\n\n \n \n {service}.{region}.volcengineapi.com\n\n例如:云服务器的Service为:ecs,其在\n\n \n \n cn-beijing-autodriving\n\nRegion的服务地址为:\n\necs.cn-beijing-autodriving.volcengineapi.com\n\n \n \n {service}.volcengineapi.com\n\n例如:访问控制的Service为:iam,作为Global服务,其服务地址为:iam.volcengineapi.com\n\n注意\n\n * 当Service中存在下划线( _ )符号时,Endpoint需转为中划线( - )符号。存在大写字母时需转成小写。\n * 存在部分云产品暂未适配标准Endpoint,请前往您所使用的云产品-API参考或开发指南中查看详情。\n\n推荐使用安全性更高的 HTTPS方式发送请求。\n\n请求方法详见各个接口具体的需求。在火山引擎中的OpenAPI大多数支持GET或POST请求。\n\n火山引擎的OpenAPI请求包含两类参数:公共请求参数和接口请求参数。其中公共请求参数在每个请求中都必须包含。接口请求参数需参考各个服务的接口文档。\n\n请求及返回结果使用UTF-8的字符集进行编码。\n\n售后在线咨询\n\n",
"logo": "https://portal.volccdn.com/obj/volcfe/misc/favicon.png",
"site_name": "API签名调用指南",
"image_list": [],
"metadata": {
"title": "请求结构--API签名调用指南-火山引擎",
"description": "火山引擎官方文档中心,产品文档、快速入门、用户指南等内容,你关心的都在这里,包含火山引擎主要产品的使用手册、API或SDK手册、常见问题等必备资料,我们会不断优化,为用户带来更好的使用体验",
"keywords": "API签名调用指南"
},
"internal_links": [],
"external_links": []
}