Schema for Onform Question-Answer (QA) Parser Settings

From QAParserSettings.schema.yaml (onform/QAParserSettings.schema)

---
$id: https://skeleton.botmd.io/onform/QAParserSettings.schema
$schema: http://json-schema.org/draft-07/schema#4

title: &title Question-Answer (QA) Parser Settings
description: &description >-
  `QAParser.settings` describes how fields can be parsed and extracted out of `QAPair`s.
  The basis of `QAParser` is `QAField` which describe individual normalized values that are extracted.

  During extraction, `QAField`s are iterated in a sequence for each `QAPair` to attempt extraction.
  As such, the order of precedence for non-array fields (i.e., `is_array = false`) are such that values extracted from later `QAField`s override earlier ones.

  For array `QAField`s (i.e., `is_array = true`), the extracted values are concatenated together.

type: object
required: [fields]
additionalProperties: false

properties:
  fields:
    title: Fields
    $ref: "#/definitions/Field Array"
  #end fields
#end properties

definitions:
  Field Array:
    title: Field Array
    description: >-
      An array of `Field` describing the how and what to extract from a `QAPair`.

    type: array
    uniqueItems: true
    items:
      $ref: "#/definitions/Field"
    #end items
  #end Field Array

  Field:
    title: Field
    description: >-
      A field describes how to identify and extract the value from a Question-Answer pair.

    type: object
    required: [regex]
    additionalProperties: false

    properties:
      key:
        title: Key
        description: The extracted value will be associated with this key.
        type: string
        minLength: 1

      regex:
        title: Regex Matches
        description: Regular expression pattern that will be used to search the question text. We use [Python regular expressions](https://pythex.org/) and `re.search` for matching.
        type: string
        minLength: 1

      type:
        title: Type
        description: >-
          Type of the extracted value. Defaults to `string`.

          - For `number`, we will convert the value to `float`.
          - For `boolean`, we support handling of non-zero numbers, `yes`, and `true`. All other values are deemed as `false`.

        enum: [string, number, boolean, datetime, time, file]

      is_array:
        title: Is Array
        description: >-
          If `is_array` and there are multiple question-field matches, the entries will be grouped together into a `list`.

          If not `is_array` and the answer value is already a list, only the first element will be used.

          Defaults to `false`.

        type: boolean

      case_sensitive:
        title: Case Sensitive
        description: Set this to `true` to use case-sensitive regex search. Defaults to `false`.
        type: boolean
      #end case_sensitive

      parse_error:
        title: Parser Error
        description: Action to take during parser error. Defaults to `raise`.

        enum: [ignore, raise]
      #end parse_error

      default:
        title: Default
        description: Default value to use when this field is not set.

        anyOf:
          - title: String
            type: string

          - title: Number
            type: number

          - title: Boolean
            type: boolean

          - title: Array of Strings
            type: array
            items:
              type: string

          - title: "Null"
            const: null
      #end default

      normalize_to_choices:
        title: Normalize (For `string` only)
        description: Normalize value to a specific set of choices using Regex. The first matching regex in this array will be used as the normalized choice. If no matches are found, the original string value is used. Only applies to `string` type.
        type: array
        minItems: 0
        items:
          type: object

          required: [regex, choice]
          additionalProperties: false
          properties:
            regex:
              title: Regex
              description: Regular expression to match for normalization.
              type: string
              minLength: 1

            choice:
              title: Cjoice
              description: Choice to be returned by the parser.
              type: string
          #end properties
      #end normalize_to_choices

      datetime_format:
        title: Datetime Format
        description: Use the given [format string](https://strftime.org/) for parsing date/datetime. Defaults to [`dateutil.parser.parse(value, dayfirst=True, default=today_midnight)`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse).
        type: string
        minLength: 1
      #end datetime_format

      time_format:
        title: Time Format
        description: Use the given [format string](https://strftime.org/) for parsing time. Defaults to [`dateutil.parser.parse(value, dayfirst=True, default=today_midnight)`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse).
        type: string
        minLength: 1
      #end datetime_format

      default_time:
        title: Default Time
        description: For date-time fields that are _date_ only, this will be the default time added to it. Defaults to `08:00`.
        type: string
        format: time
      #end default_time

      timezone:
        title: Timezone
        description: Ensures that the timezone of any parsed `DATETIME` or `TIME` field is localized to this timezone. Defaults to `UTC`.

        type: string
        minLength: 1
      #end timezone
    #end properties
  #end Field
#end definitions

[Main Page] [Schema Documentation] [Examples]