From QAParserSettings.schema.yaml (onform/QAParserSettings.schema
)
---
$id: https://skeleton.botmd.io/onform/QAParserSettings.schema
$schema: http://json-schema.org/draft-07/schema#4
title: &title Question-Answer (QA) Parser Settings
description: &description >-
`QAParser.settings` describes how fields can be parsed and extracted out of `QAPair`s.
The basis of `QAParser` is `QAField` which describe individual normalized values that are extracted.
During extraction, `QAField`s are iterated in a sequence for each `QAPair` to attempt extraction.
As such, the order of precedence for non-array fields (i.e., `is_array = false`) are such that values extracted from later `QAField`s override earlier ones.
For array `QAField`s (i.e., `is_array = true`), the extracted values are concatenated together.
type: object
required: [fields]
additionalProperties: false
properties:
fields:
title: Fields
$ref: "#/definitions/Field Array"
#end fields
#end properties
definitions:
Field Array:
title: Field Array
description: >-
An array of `Field` describing the how and what to extract from a `QAPair`.
type: array
uniqueItems: true
items:
$ref: "#/definitions/Field"
#end items
#end Field Array
Field:
title: Field
description: >-
A field describes how to identify and extract the value from a Question-Answer pair.
type: object
required: [regex]
additionalProperties: false
properties:
key:
title: Key
description: The extracted value will be associated with this key.
type: string
minLength: 1
regex:
title: Regex Matches
description: Regular expression pattern that will be used to search the question text. We use [Python regular expressions](https://pythex.org/) and `re.search` for matching.
type: string
minLength: 1
type:
title: Type
description: >-
Type of the extracted value. Defaults to `string`.
- For `number`, we will convert the value to `float`.
- For `boolean`, we support handling of non-zero numbers, `yes`, and `true`. All other values are deemed as `false`.
enum: [string, number, boolean, datetime, time, file]
is_array:
title: Is Array
description: >-
If `is_array` and there are multiple question-field matches, the entries will be grouped together into a `list`.
If not `is_array` and the answer value is already a list, only the first element will be used.
Defaults to `false`.
type: boolean
case_sensitive:
title: Case Sensitive
description: Set this to `true` to use case-sensitive regex search. Defaults to `false`.
type: boolean
#end case_sensitive
parse_error:
title: Parser Error
description: Action to take during parser error. Defaults to `raise`.
enum: [ignore, raise]
#end parse_error
default:
title: Default
description: Default value to use when this field is not set.
anyOf:
- title: String
type: string
- title: Number
type: number
- title: Boolean
type: boolean
- title: Array of Strings
type: array
items:
type: string
- title: "Null"
const: null
#end default
normalize_to_choices:
title: Normalize (For `string` only)
description: Normalize value to a specific set of choices using Regex. The first matching regex in this array will be used as the normalized choice. If no matches are found, the original string value is used. Only applies to `string` type.
type: array
minItems: 0
items:
type: object
required: [regex, choice]
additionalProperties: false
properties:
regex:
title: Regex
description: Regular expression to match for normalization.
type: string
minLength: 1
choice:
title: Cjoice
description: Choice to be returned by the parser.
type: string
#end properties
#end normalize_to_choices
datetime_format:
title: Datetime Format
description: Use the given [format string](https://strftime.org/) for parsing date/datetime. Defaults to [`dateutil.parser.parse(value, dayfirst=True, default=today_midnight)`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse).
type: string
minLength: 1
#end datetime_format
time_format:
title: Time Format
description: Use the given [format string](https://strftime.org/) for parsing time. Defaults to [`dateutil.parser.parse(value, dayfirst=True, default=today_midnight)`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse).
type: string
minLength: 1
#end datetime_format
default_time:
title: Default Time
description: For date-time fields that are _date_ only, this will be the default time added to it. Defaults to `08:00`.
type: string
format: time
#end default_time
timezone:
title: Timezone
description: Ensures that the timezone of any parsed `DATETIME` or `TIME` field is localized to this timezone. Defaults to `UTC`.
type: string
minLength: 1
#end timezone
#end properties
#end Field
#end definitions