Skip to content

Public Information

Base Parameters

Parameter Description
BasePath the base path for API calls
Endpoint the random path generated when calling APIs
AccessKey the AccessKey for API calls

Request Path

https://{basePath}/search/{endpoint}/read

Request Method

GET

Request Header

Parameter Type Description
Authorization String Input format: Bearer {AK}, where AK refers to the AccessKey
Pragma String Input value: no-cache. When no-cache is provided, the response results will not be cached and each request is independent; if not provided, the response results for the same query term will be cached for 10 minutes

Request Body Parameters

Parameter Required Type Description
url Y String Target URL to read
formats N List Content format. Options::
- HTML
- TEXT
- MARKDOWN
Must select one value; defaults to TEXT if not provided
mode N String Reading mode::
- quality:Quality mode, uses dynamic rendering
- fast:Fast mode, uses static page reading (default if not provided)
- auto:Auto mode, automatically selects fast or quality based on the URL
- Non-enumerated values default to fast
totalTimeout N Int Disabled by default。End-to-end request timeout in milliseconds. The request will be aborted if it exceeds this time.
timeout N Int Read timeout in milliseconds. Default: 30000ms
Controls the API connection time, not the total request duration.
imageDownloadEnable N Bool Image conversion switch (converts to base64 encoding).Default false
When formats is HTML/MARKDOWN and imageDownloadEnable is true, image links in the response are converted to base64.(Note: Enabling this will incur additional fees for images parsed from the webpage.
Note: If the URL points to a PDF, this feature is disabled and the result is returned in base64 by default.
imageInContent N Bool Works with imageDownloadEnable.
Default: true; images converted to base64 are embedded in the main content.
If set to false, an additional parameter image_base64_list is returned, containing all base64 images in a separate list.
pdfExtractEnable N Bool When the URL points to a PDF, use this to enable/disable content extraction. Default: false
true: Return parsed PDF text content
false: Return PDF as base64
Has no effect if the URL does not point to a PDF.
Note: PDF content extraction incurs additional fees.
enhancedOcr N Bool Only effective if pdfExtractEnable is true
Enhanced PDF extraction, improves accuracy and coverage. Default: False
Note: OCR recognition incurs additional fees.

Response Value Description

Field Type Description
html String When format is html, returns the HTML version of the page content.
markdown String When format is markdown, returns the Markdown version of the content.
text String When format is text, returns the plain text version of the content.
metadata.title String Webpage title.
metadata.description String Webpage description.
metadata.keywords String Webpage keywords.
logo String URL of the website logo.
site_name String Website name.
image_list Array List of image URLs. Only included if the page contains images.
image_base64_list Array List of images in base64 encoding.
pdf_pages Int Number of pages if the target is a PDF.
internal_links Array List of internal links (same domain as the target URL) found in the content.
external_links Array List of external links (different domain from the target URL) found in the content.
external_links Array List of external links found in the target URL content.

Request Example

cURL

curl --location --request POST 'https://{BasePath}/search/{Endpoint}/read' \
--header 'Authorization: Bearer {AK}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://www.volcengine.com/docs/6369/67267",
    "formats": [
        "TEXT"
    ],
    "mode":"quality"
}'

Python(Requests)

import requests
import json

BasePath = 'xxxxxx'   # Replace with your obtained BasePath
Endpoint = 'xxxxxx'   # Replace with your obtained Endpoint
AK = 'xxxxxx'         # Replace with your obtained AccessKey

url = f"https://{BasePath}/search/{Endpoint}/read"

payload = json.dumps({
"url": "https://www.volcengine.com/docs/6369/67267",
"formats":[
    "TEXT"
    ],
"mode": "quality"
})

headers = {
'Authorization': f'Bearer {AK}',
"Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=payload)

print(response.text)

Response Example

{
    "markdown": "导航\n\n请求结构\n\n最近更新时间:2025.04.23 12:06:55首次发布时间:2021.02.25 21:02:47\n\n火山引擎的OpenAPI的请求结构如下:\n\n服务地址(Endpoint)用于访问火山引擎的云服务,通常是一个URL,客户端可以通过该地址与服务进行通信。  \n火山引擎服务地址的标准形式为:\n\nRegional服务 (区域化部署) Global服务(中心化部署)\n\n    \n    \n    {service}.{region}.volcengineapi.com\n\n例如:云服务器的Service为:ecs,其在\n\n    \n    \n    cn-beijing-autodriving\n\nRegion的服务地址为:\n\necs.cn-beijing-autodriving.volcengineapi.com\n\n    \n    \n    {service}.volcengineapi.com\n\n例如:访问控制的Service为:iam,作为Global服务,其服务地址为:iam.volcengineapi.com\n\n注意\n\n  * 当Service中存在下划线( _ )符号时,Endpoint需转为中划线( - )符号。存在大写字母时需转成小写。\n  * 存在部分云产品暂未适配标准Endpoint,请前往您所使用的云产品-API参考或开发指南中查看详情。\n\n推荐使用安全性更高的 HTTPS方式发送请求。\n\n请求方法详见各个接口具体的需求。在火山引擎中的OpenAPI大多数支持GET或POST请求。\n\n火山引擎的OpenAPI请求包含两类参数:公共请求参数和接口请求参数。其中公共请求参数在每个请求中都必须包含。接口请求参数需参考各个服务的接口文档。\n\n请求及返回结果使用UTF-8的字符集进行编码。\n\n售后在线咨询\n\n",
    "logo": "https://portal.volccdn.com/obj/volcfe/misc/favicon.png",
    "site_name": "API签名调用指南",
    "image_list": [],
    "metadata": {
        "title": "请求结构--API签名调用指南-火山引擎",
        "description": "火山引擎官方文档中心,产品文档、快速入门、用户指南等内容,你关心的都在这里,包含火山引擎主要产品的使用手册、API或SDK手册、常见问题等必备资料,我们会不断优化,为用户带来更好的使用体验",
        "keywords": "API签名调用指南"
    },
    "internal_links": [],
    "external_links": []
}