Skip to main content
Skip table of contents

Ingest API

The Ingest Service is designed to ingest documents into the Docbyte Vault, offering functionalities for managing document ingestion, metadata, and updating existing records.

Table Of Contents


1. Ingest Document

Description
Creates or updates a SIP in the Archive using the provided document(s) and metadata. Can be used for new packages or as a delta SIP for existing AIPs.

Request

CODE
PUT /document

Headers

Name

Description

Type

Required

Authorization

Bearer token (e.g., Bearer your_jwt_token)

string

Yes

enrichers

Optional list to specify what information should be part of the result.

If no value is provided only the basic information of a package is returned.

VALUES: pluginreports, permissions, metadata, breadcrumb

string[]

No

Body Parameters

Name

Description

Type

Required

profile

Name of profile to store the document in the archive. Retention policy is applied based on profile.

VALUES: Provided By Docbyte

string

Yes

packageId

Allows you to provide a custom package ID for the document. Only alphanumeric characters are allowed.

If not provided, a new UUID is automatically generated.

string

No

document

Single file to ingest (Content)

Content

No

documents

Multiple files to ingest (array of Content)

At least one of document or documents is required.
You must provide either one, not both.

Content[]

No

metadatas

Array of metadata objects (MetadataObject)

Metadata[]

Yes

Example -

CODE
{
    "profile": "string",
    "metadatas": [
        {
            "schema": "string",
            "metadata": {
                "employeeId": "string",
                "title": "Contract",
                "documentDate": "2025-02-15T00:00:00Z",
                "department": "HR",
                "category": "Manual",
                "expiryDate": "2027-02-15T00:00:00Z"
            }
        }
    ],
    "document": {
        "type": "S3",
        "name": "contract.pdf",
        "content": "b31b7379-2864-4b74-8475-0da8f7b1148b"
    }
}

Response

  • 200
    Returns an IngestResponse with basic SIP details (status, submission date, etc.).

  • default
    Returns an ErrorResponse with an error code and message.


2. Update Document

Description
Updates the descriptive metadata associated with an existing package in the archive.

The ingest service will create a delta SIP with the new metadata and ingest it in the archive. Ingest process is async

Request

CODE
POST /document/{packageId}

Headers

Name

Description

Type

Required

Authorization

Bearer token (e.g., Bearer your_jwt_token)

string

Yes

enrichers

Optional list to specify what information should be part of the result.

If no value is provided only the basic information of a package is returned.

VALUES: pluginreports, permissions, metadata, breadcrumb

string[]

No

Path Parameters

Name

Description

Type

Required

packageId

The unique identifier of the AIP (Archival Information Package) that you want to update.

Where to find it ?

  • Initially provided by you during ingest

  • Returned when a document is initially ingested

  • When Using Dissemination API to search for documents, it’s present inside each document object as aip:packageId

string

Yes

Body Parameters

Name

Description

Type

Required

packageId

Same value as Path Parameter

string

Yes

metadatas

Array of metadata objects (MetadataObject)

Metadata[]

Yes

Example -

CODE
{
  "packageId": "<string>",
  "metadatas": [
        {
            "schema": "string",
            "metadata": {
                "employeeId": "string",
                "title": "Contract",
                "documentDate": "2025-02-15T00:00:00Z",
                "department": "HR",
                "category": "Manual",
                "expiryDate": "2027-02-15T00:00:00Z"
            }
        }
    ]
}

Response


3. Request Upload URL

Description
Generates a URL that can be used for uploading Content which can be referenced during ingest. This can help handle large files by avoiding large direct uploads to the ingest endpoint.

Request

CODE
GET /url

Headers

Name

Description

Type

Required

Authorization

Bearer token (e.g., Bearer your_jwt_token)

string

Yes

Response


4. Ingest SIP

Description
Triggers ingest of an existing SIP package.

Request

CODE
PUT /sip

Headers

Name

Description

Type

Required

Authorization

Bearer token (e.g., Bearer your_jwt_token)

string

Yes

enrichers

Optional list to specify what information should be part of the result.

If no value is provided only the basic information of a package is returned.

VALUES: pluginreports, permissions, metadata, breadcrumb

string[]

No

Body Parameters

Name

Description

Type

Required

type

How the content is provided (binary data, direct URL, or S3 reference)

VALUES: BINARY, URL, S3

string

Yes

content

Actual content (base64-encoded data) or a reference (URL or S3 key)

string

Yes

name

Name of the file being ingested

string

Yes

Example -

CODE
{
    "type": "BINARY | URL | S3",
    "content": "base64-encoded data or URL or S3 key",
    "name": "filename"
  }

Response


5. SIP Ingest Status

Description
Retrieves the status of a SIP ingest operation.

Request

CODE
GET /sip/{packageId}/{submissionName}

Headers

Name

Description

Type

Required

Authorization

Bearer token (e.g., Bearer your_jwt_token)

string

Yes

enrichers

Optional list to specify what information should be part of the result.

If no value is provided only the basic information of a package is returned.

VALUES: pluginreports, permissions, metadata, breadcrumb

string[]

No

Path Parameters

Name

Description

Type

Required

packageId

the packageId that is returned in the response of Ingest SIP or provided by you during Ingest

string

Yes

submissionName

name field returned in the response of Ingest SIP

string

Yes

Response

  • 200
    Returns an IngestResponse.

  • 404
    SIP not found. This can mean ingestion is finished or the SIP was never uploaded. AIP should be checked to be sure.

  • default
    Returns an ErrorResponse.


Schemas

The schemas below are referenced above. All are JSON-based.

IngestResponse Schema

Example of a successful response when a package is accepted for ingestion. This does not mean the ingest was successful.

Ingest is an async process. Status should be reviewed and followed up.

CODE
{
  "packageId": "string",            // Unique id for the package
  "packageType": "string",          // Indicates the type of package (e.g., AIP)
  "pluginReports": [                // The overview of plugins executed on the package and their result. Populated when the pluginreports enricher is enabled in the header.
    {
      "executionDate": "string",    // Date/time of plugin execution in ISO 8601 format
      "pluginName": "string",       // Name of the executed plugin
      "message": "string",          // Message returned by the plugin
      "status": "SUCCESS | WARNING | FAILURE" // Status of the plugin execution
    }
  ],
  "permissions": [
    "string"                        // The permission the current user has on the package. Populated when the permission enricher is enabled in the header.
  ],
  "name": "string",                 // Name of the SIP
  "status": "string",               // Current status of the SIP (e.g., 'INGESTED', 'PENDING')
  "submissionDate": "string"        // Date/time the SIP was submitted in ISO 8601 format
}

ErrorResponse Schema

Returned when an error occurs.

CODE
{
  "code": "ERROR_CODE",
  "message": "Description of the error"
}

Content Schema

CODE
{
  "type": "BINARY | URL | S3",      // How the content is provided (binary data, direct URL, or S3 reference)
  "content": "string",              // Actual content (base64-encoded data) or a reference (URL or S3 key)
  "name": "string"                  // Name of the file being ingested
}

MetadataObject Schema

Represents a single piece of metadata. You can either specify metadata fields directly using MetadataFields schema or attach a metadata file using MetadataFile schema.

MetadataFields

Name

Description

Type

Required

schema

The URI of the metadata schema of this object. The metadata schema should be registered in the archive.

VALUES: Provided By Docbyte

String

YES

metadata

A JSON object containing the metadata values adhering to the defined schema

Object

YES

CODE
{
  "schema": "string",              
  "metadata": {
    // Key-value pairs describing the metadata
    // Example:
    "exampleKey": {
      "subKey": "someValue"         // Structure can be nested objects
    }
  }
}

MetadataFile

CODE
{
  "schema": "string",
  "file": {                          // Same as Content Schema
    "type": "BINARY | URL | S3",
    "Content": "base64-encoded data or URL or S3 key",
    "name": "filename"
  }
}

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.