Skip to content

文件数据集

文件数据集说明

文件数据集是数据集的一种,是用户通过上传本地文件,创建的数据集。文件数据集可以上传到衡石引擎,也可以上传到用户配置的可写数据连接中。

文件数据集结构说明

详情见 数据集说明,下面是补充结构说明。

字段类型描述
options.typeSTRING数据集类型,数据连接数据集的 type 为 connection
options.connectionIdNUMBER文件数据集所在的数据连接 id
options.originSTRING文件的来源类型
options.delimiterSTRING文件中定义的分隔符
options.encodingSTRING文件编码
options.headerINTEGER表头所在行数,从0开始
options.originSTRING文件的原始类型
options.rangeOBJECT数组选定的表格范围
options.range[].xbeginINTEGER开始列数,从0开始,包含自身
options.range[].xendINTEGER结束列数,从0开始,包含自身
options.range[].ybeginINTEGER开始行数,从0开始,包含自身
options.range[].yendINTEGER结束行数,从0开始,包含自身

接口说明

上传文件

请求URL

http
POST /api/files HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
fileBINARY必填, 模版文件流

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT文件的描述
data.fileIdNUMBER文件id
data.typeSTRING文件类型
data.sheetListOBJECT 数组文件的sheet信息
data.sheetList[].idNUMBERsheet 的 id
data.sheetList[].nameNUMBERsheet 的 名字,值为 文件名 + sheet 名

接口示例1: 上传文件

http
POST /api/files HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数:

text
file: (binary)
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "data": {
        "fileId": "32",
        "type": "file_excel",
        "sheetList": [
            {
                "id": 0,
                "name": "a_ivt_regions a_ivt_regions"
            }
        ]
    }
}

预览文件{fileId}的第{sheetId}个工作表

请求URL

http
POST /api/files/{fileId}/sheets/{sheetId}/preview HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
fileIdNUMBER必填, 文件的 id
sheetIdNUMBER必填, sheet的 id
request body 请求体

请求体是 JSON 实体。

字段类型描述
offsetNUMBER偏移量
limitNUMBER限制条数
transposeBOOL是否行列反转,false 表示不反转,true 表示反转,默认为 false
delimiterSTRING分隔符
limitNUMBER分隔符
encodingSTRING文件编码

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT文件的描述
data.schemaOBJECT 数组每一个元素表示一个数据集字段的属性,与数据集的 options.schema 相同
data.dataOBJECT 数组每一个元素是一行数据,一行中每个值与schema元素一一对应
data.suggestOptionsOBJECT系统分析的文件详细信息
data.suggestOptions.delimiterSTRING文件中定义的分隔符
data.suggestOptions.encodingSTRING文件编码
data.suggestOptions.headerNUMBER文件的表头是第几行
data.suggestOptions.originSTRING文件的原始类型

接口示例1: 预览文件数据集的一个sheet

http
POST /api/files/32/sheets/0/preview HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

// Request Body:
{"delimiter":"comma","encoding":"UTF-8","transpose":false}
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "data": {
        "schema": [
            {
                "fieldName": "_c0",
                "type": "string",
                "visible": true,
                "label": "region_name"
            },
            {
                "fieldName": "_c1",
                "type": "string",
                "visible": true,
                "label": "region_id"
            }
        ],
        "data": [
            [
                "region_name",
                "region_id"
            ]
        ],
        "suggestOptions": {
            "encoding": "UTF-8",
            "delimiter": "comma",
            "header": 0,
            "padHeader": false,
            "transpose": false,
            "origin": "file_excel",
            "offset": 0,
            "limit": 1000
        }
    }
}

选择文件{fileId}的第{sheetId}个工作表

请求URL

http
POST /api/files/{fileId}/sheets/{sheetId}/select HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
fileIdNUMBER必填, 文件的 id
sheetIdNUMBER必填, sheet的 id
request body 请求体

请求体是 JSON 实体。

字段类型描述
delimiterSTRING文件中定义的分隔符
encodingSTRING文件编码
headerINTEGER表头所在行数,从0开始
originSTRING文件的原始类型
rangeOBJECT数组选定的表格范围
range[].xbeginINTEGER开始列数,从0开始,包含自身
range[].xendINTEGER结束列数,从0开始,包含自身
range[].ybeginINTEGER开始行数,从0开始,包含自身
range[].yendINTEGER结束行数,从0开始,包含自身

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT文件数据集结构说明

接口示例1:

http
POST /api/files/35/sheets/0/select HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

// Request Body:
{
    "origin": "file_excel",
    "header": 0,
    "padHeader": false,
    "delimiter": "comma",
    "transpose": false,
    "range": [
        {
            "xbegin": 0,
            "xend": 1,
            "ybegin": 0,
            "yend": 4
        }
    ],
    "encoding": "UTF-8"
}
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "data": {
        "data": [
            [
                "Europe",
                "1"
            ]
        ],
        "schema": [
            {
                "fieldName": "_c0",
                "basicType": "string",
                "defaultAggrType": "count",
                "type": "string",
                "originType": "string",
                "label": "region_name",
                "config": {},
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "string"
                ],
                "detectedType": "string"
            },
            {
                "fieldName": "_c1",
                "basicType": "number",
                "defaultAggrType": "sum",
                "type": "number",
                "originType": "string",
                "label": "region_id",
                "config": {
                    "seperator": " ",
                    "dialectName": "PostgresqlDialect"
                },
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "number",
                    "string"
                ],
                "detectedType": "number"
            }
        ],
        "pagable": true,
        "importSwitchable": false,
        "randomable": false
    }
}

获取当前应用中可上传本地文件的自定义数据连接

请求URL

http
GET /api/apps/{appId}/file-writable-connection HTTP/1.1
Accept: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
appIdNUMBER应用的 id

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT可用于上传本地文件的数据连接的数组,详见 数据连接的结构说明

接口示例1: 获取当前应用中可用于上传本地文件的数据连接

http
GET /api/apps/4669/file-writable-connection HTTP/1.1
Accept: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "code": 0,
    "msg": "success",
    "data": [
        {
            "id": 2058,
            "options": {
                "encoding": "UTF-8",
                "type": "postgresql",
                "maxConnNum": 10,
                "config": {},
                "category": "Database",
                "protocol": "http",
                "outputAble": true,
                "fileOutputPath": [
                    "app"
                ]
            },
            "createdBy": 1,
            "createdAt": "2020-05-18 11:46:14",
            "updatedBy": 1,
            "updatedAt": "2020-07-09 17:59:28",
            "visible": true,
            "isDelete": false,
            "title": "***",
            "status": 0,
            "refreshStats": {},
            "hsVersion": 1
        }
    ]
}

获取可用于上传本地文件的衡石内置数据连接

请求URL

http
GET /api/connections/internal HTTP/1.1
Accept: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT可用于上传本地文件的数据连接的数组,详见 数据连接的结构说明

接口示例1:

http
GET /api/connections/internal HTTP/1.1
Accept: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "code": 0,
    "msg": "success",
    "data": [
        {
            "id": 3,
            "options": {
                "type": "engine",
                "maxConnNum": 10,
                "config": {},
                "category": "Internal",
                "protocol": "http",
                "outputAble": true,
                "fileOutputPath": [
                    "hengshi_internal_engine_tmp_schema"
                ]
            },
            "createdAt": "2020-02-22 10:11:00",
            "updatedAt": "2020-02-22 10:11:00",
            "visible": true,
            "isDelete": false,
            "title": "引擎连接",
            "status": 0,
            "refreshStats": {},
            "hsVersion": 0
        }
    ]
}

保存文件{fileId}的第{sheetId}个工作表

请求URL

http
POST /api/files/{fileId}/sheets/{sheetId}/apps/{appId}/save HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
fileIdNUMBER必填, 文件的 id
sheetIdNUMBER必填, sheet的 id
appIdNUMBER必填, 文件数据集要保存的应用 id
request body 请求体

请求体是 JSON 实体。

字段类型描述
titleSTRING数据集的名字
optionsOBJECT文件数据集的数据结构 options

文件数据集结构说明中的 options 信息都是必须提供的,其中 options.connectionId 在这里表示文件数据集要上传到的数据连接 id,就是通过接口(获取当前应用中可上传本地文件的自定义数据连接)或者接口 (获取可用于上传本地文件的衡石内置数据连接)获得的数据连接中的一个。

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT文件数据集结构说明

接口示例1:

http
POST /api/files/35/sheets/0/apps/4669/save HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

// Request Body:
{
    "title": "a_ivt_regions",
    "options": {
        "connectionId": 3,
        "schema": [
            {
                "fieldName": "_c0",
                "basicType": "string",
                "defaultAggrType": "count",
                "type": "string",
                "originType": "string",
                "label": "region_name",
                "config": {},
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "string"
                ],
                "detectedType": "string",
                "distinct": false,
                "alias": "region_name",
                "name": "_c0"
            },
            {
                "fieldName": "_c1",
                "basicType": "number",
                "defaultAggrType": "sum",
                "type": "number",
                "originType": "string",
                "label": "region_id",
                "config": {
                    "seperator": " ",
                    "dialectName": "PostgresqlDialect"
                },
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "number",
                    "string"
                ],
                "detectedType": "number",
                "distinct": false,
                "alias": "region_id",
                "name": "_c1"
            }
        ],
        "origin": "file_excel",
        "header": 0,
        "padHeader": false,
        "delimiter": "comma",
        "encoding": "UTF-8",
        "transpose": false,
        "range": [
            {
                "xbegin": 0,
                "xend": 1,
                "ybegin": 0,
                "yend": 4
            }
        ],
        "cache": false
    }
}
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "data": {
        "id": 11,
        "title": "a_ivt_regions",
        "createdBy": 1,
        "createdAt": "2020-07-09 18:25:15",
        "updatedBy": 1,
        "updatedAt": "2020-07-09 18:25:16",
        "visible": true,
        "isDelete": false,
        "appId": 4669,
        "options": {
            "cache": false,
            "type": "connection",
            "totalSize": 0,
            "rowCount": 0,
            "connectionTitle": "hengshi_sense_internal_storage",
            "refreshHours": [],
            "refreshMinute": 0,
            "connectionId": 1,
            "origin": "file_excel",
            "table": "file_tb_e146214e117eab43ec3dc90292d461f4",
            "path": [],
            "transpose": false,
            "delimiter": ",",
            "encoding": "UTF-8",
            "header": 0,
            "padHeader": false,
            "range": [
                {
                    "xbegin": 0,
                    "xend": 1,
                    "ybegin": 0,
                    "yend": 4
                }
            ],
            "storageType": "engine",
            "dialectOptions": {
                "dialectName": "PostgresqlDialect",
                "majorVersion": 10,
                "minorVersion": 4
            },
            "storageConnectionId": 3,
            "storageConnectionTitle": "引擎连接",
            "schema": [
                {
                    "datasetId": 11,
                    "fieldName": "_c0",
                    "hsVersion": 1,
                    "basicType": "string",
                    "defaultAggrType": "count",
                    "type": "string",
                    "originType": "string",
                    "comment": "",
                    "label": "region_name",
                    "config": {},
                    "visible": true,
                    "nativeType": "text",
                    "suggestedTypes": [
                        "string"
                    ],
                    "detectedType": "string"
                },
                {
                    "datasetId": 11,
                    "fieldName": "_c1",
                    "hsVersion": 1,
                    "basicType": "number",
                    "defaultAggrType": "sum",
                    "type": "number",
                    "originType": "string",
                    "comment": "",
                    "label": "region_id",
                    "config": {
                        "seperator": " ",
                        "dialectName": "PostgresqlDialect"
                    },
                    "visible": true,
                    "nativeType": "text",
                    "suggestedTypes": [
                        "number",
                        "string"
                    ],
                    "detectedType": "number"
                },
                {
                    "datasetId": 11,
                    "fieldName": "_hs_row_id",
                    "hsVersion": 0,
                    "basicType": "number",
                    "defaultAggrType": "sum",
                    "type": "number",
                    "originType": "integer",
                    "comment": "",
                    "config": {
                        "dialectName": "PostgresqlDialect"
                    },
                    "visible": false,
                    "nativeType": "serial",
                    "suggestedTypes": [
                        "number",
                        "string"
                    ],
                    "detectedType": "integer"
                }
            ],
            "metrics": []
        },
        "importType": 1,
        "importStatus": 1,
        "importOptions": {},
        "status": 3,
        "refreshStats": {
            "refreshAt": "2020-07-09 18:25:15",
            "executeRefreshAt": "2020-07-09 18:25:15",
            "executeRefreshRowCountAt": 1594290316712
        },
        "datasetAcl": {
            "level": "FULLACCESS",
            "dataFilters": []
        },
        "hsVersion": 7,
        "creator": {
            "id": 1,
            "name": "trial",
            "email": "trial@hengshi.io"
        },
        "updater": {
            "id": 1,
            "name": "trial",
            "email": "trial@hengshi.io"
        },
        "importSwitchable": false,
        "refreshSchema": false,
        "type": "connection",
        "origin": "file_excel",
        "emptyDataset": false,
        "public": true
    }
}

用文件{fileId}的第{sheetId}个工作表替换数据集

请求URL

http
POST /api/files/{fileId}/sheets/{sheetId}/apps/{appId}/replace HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

请求参数

URL 参数
字段类型说明
fileIdNUMBER必填, 文件的 id
sheetIdNUMBER必填, sheet的 id
appIdNUMBER必填, 数据集所在的应用 id
request body 请求体

请求体是 JSON 实体。

字段类型描述
idNUMBER要替换的数据集 id
optionsOBJECT文件数据集的数据结构 options

文件数据集结构说明中的 options 信息都是必须提供的,其中 options.connectionId 在这里表示文件数据集要上传到的数据连接 id,就是通过接口(获取当前应用中可上传本地文件的自定义数据连接)或者接口 (获取可用于上传本地文件的衡石内置数据连接)获得的数据连接中的一个。

和新建文件数据集的不同之处在于options.schema,如果新的数据集fieldName和原数据集的fieldName不同,那新的fieldName保存到dbFieldName字段里,这和其它替换数据集相同。

返回对象的格式说明

字段类型说明
versionSTRING当前系统版本哈希值
dataOBJECT文件数据集结构说明

接口示例1:

http
POST /api/files/35/sheets/0/apps/4669/replace HTTP/1.1
Content-Type: application/json
Cookie: csrf=183f1c4...; sid=26ee552d...; _USER_SESSION_ID=f2a01083...

// Request Body:
{
    "id": 11,
    "options": {
        "schema": [
            {
                "fieldName": "_c0",
                "basicType": "string",
                "defaultAggrType": "count",
                "type": "string",
                "originType": "string",
                "label": "country_id",
                "config": {},
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "string"
                ],
                "detectedType": "string",
                "distinct": false,
                "alias": "country_id"
            },
            {
                "fieldName": "_c1",
                "basicType": "string",
                "defaultAggrType": "count",
                "type": "string",
                "originType": "string",
                "label": "country_name",
                "config": {},
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "string"
                ],
                "detectedType": "string",
                "distinct": false,
                "alias": "country_name"
            },
            {
                "fieldName": "_c2",
                "basicType": "number",
                "defaultAggrType": "sum",
                "type": "number",
                "originType": "string",
                "label": "region_id",
                "config": {
                    "seperator": " ",
                    "dialectName": "PostgresqlDialect"
                },
                "visible": true,
                "nativeType": "varchar",
                "suggestedTypes": [
                    "number",
                    "string"
                ],
                "detectedType": "number",
                "distinct": false,
                "alias": "region_id"
            }
        ],
        "origin": "file_excel",
        "header": 0,
        "padHeader": false,
        "delimiter": "comma",
        "encoding": "UTF-8",
        "transpose": false,
        "range": [
            {
                "xbegin": 0,
                "xend": 2,
                "ybegin": 0,
                "yend": 3
            }
        ],
        "cache": false,
        "connectionId": 3
    }
}
http
HTTP/1.1 200 Ok
Content-Type: application/json

{
    "version": "version@9a5e106#6730f0d",
    "data": {
        "id": 11,
        "title": "a_ivt_regions",
        "createdBy": 1,
        "createdAt": "2020-07-09 18:25:15",
        "updatedBy": 1,
        "updatedAt": "2020-07-09 18:40:30",
        "visible": true,
        "isDelete": false,
        "appId": 4669,
        "options": {
            "cache": false,
            "type": "connection",
            "totalSize": 0,
            "rowCount": 0,
            "connectionTitle": "hengshi_sense_internal_storage",
            "refreshHours": [],
            "refreshMinute": 0,
            "connectionId": 1,
            "origin": "file_excel",
            "table": "file_tb_d7a4d26309bfdcd433aef19e21863175",
            "path": [],
            "transpose": false,
            "delimiter": ",",
            "encoding": "UTF-8",
            "header": 0,
            "padHeader": false,
            "range": [
                {
                    "xbegin": 0,
                    "xend": 2,
                    "ybegin": 0,
                    "yend": 3
                }
            ],
            "storageType": "engine",
            "dialectOptions": {
                "dialectName": "PostgresqlDialect",
                "majorVersion": 10,
                "minorVersion": 4
            },
            "storageConnectionId": 3,
            "storageConnectionTitle": "引擎连接",
            "schema": [
                {
                    "datasetId": 11,
                    "fieldName": "_c0",
                    "hsVersion": 3,
                    "basicType": "string",
                    "defaultAggrType": "count",
                    "type": "string",
                    "originType": "string",
                    "comment": "",
                    "label": "country_id",
                    "config": {},
                    "visible": true,
                    "nativeType": "text",
                    "suggestedTypes": [
                        "string"
                    ],
                    "detectedType": "string"
                },
                {
                    "datasetId": 11,
                    "fieldName": "_c1",
                    "hsVersion": 3,
                    "basicType": "string",
                    "defaultAggrType": "count",
                    "type": "string",
                    "originType": "string",
                    "comment": "",
                    "label": "country_name",
                    "config": {
                        "seperator": " ",
                        "dialectName": "PostgresqlDialect"
                    },
                    "visible": true,
                    "nativeType": "text",
                    "suggestedTypes": [
                        "string"
                    ],
                    "detectedType": "string"
                },
                {
                    "datasetId": 11,
                    "fieldName": "_c2",
                    "hsVersion": 1,
                    "basicType": "number",
                    "defaultAggrType": "sum",
                    "type": "number",
                    "originType": "string",
                    "comment": "",
                    "label": "region_id",
                    "config": {
                        "seperator": " ",
                        "dialectName": "PostgresqlDialect"
                    },
                    "visible": true,
                    "nativeType": "text",
                    "suggestedTypes": [
                        "number",
                        "string"
                    ],
                    "detectedType": "number"
                },
                {
                    "datasetId": 11,
                    "fieldName": "_hs_row_id",
                    "hsVersion": 0,
                    "basicType": "number",
                    "defaultAggrType": "sum",
                    "type": "number",
                    "originType": "integer",
                    "comment": "",
                    "config": {
                        "dialectName": "PostgresqlDialect"
                    },
                    "visible": false,
                    "nativeType": "serial",
                    "suggestedTypes": [
                        "number",
                        "string"
                    ],
                    "detectedType": "integer"
                }
            ],
            "metrics": []
        },
        "importType": 1,
        "importStatus": 1,
        "importOptions": {},
        "status": 3,
        "refreshStats": {
            "refreshAt": "2020-07-09 18:40:29",
            "executeRefreshAt": "2020-07-09 18:40:29",
            "executeRefreshRowCountAt": 1594291230528
        },
        "datasetAcl": {
            "level": "FULLACCESS",
            "dataFilters": []
        },
        "hsVersion": 17,
        "creator": {
            "id": 1,
            "name": "trial",
            "email": "trial@hengshi.io"
        },
        "updater": {
            "id": 1,
            "name": "trial",
            "email": "trial@hengshi.io"
        },
        "importSwitchable": false,
        "refreshSchema": false,
        "type": "connection",
        "origin": "file_excel",
        "emptyDataset": false,
        "public": true
    }
}

HENGSHI SENSE API 使用手册