Test suite containing test cases and associated metadata.
The Jarvis endpoint for running the test. Defaults to the runner's environment.
Describes the endpoint for Jarvis V1.
The HTTPS endpoint for Jarvis.
Must be at least 1
characters long
Must be at most 2083
characters long
The timeout in seconds for HTTP requests.
The maximum number of concurrent queries to run. Defaults to 10.
"v1"
Describes the endpoint for Jarvis V2.
The timeout in seconds for async queries. Only applicable when use_async
is True.
The HTTPS endpoint for Jarvis.
Must be at least 1
characters long
Must be at most 2083
characters long
The timeout in seconds for HTTP requests.
The name of the index to run the query against.
The maximum number of concurrent queries to run. Defaults to 10.
The query engine to use.
"v2"
Whether to use the async query flow. This is only applicable when querying against a remote server. Local queries are async by default. Defaults to true
.
The metadata for the root model
The description of the test suite.
The name of the test suite.
Must be at least 1
characters long
Version of this test suite based on Kondo.
The list of test cases
Must contain a minimum of 1
items
Test cases are a single unit of test within a test suite.
The asserts for this test case.
Must contain a minimum of 1
items
Asserts that a response metadata has a certain value.
The value that this response metadata should equals.
The maximum value (inclusive) that this response metadata should have.
The minimum value (inclusive) that this response metadata should have.
The JMES path to the metadata value that we want to assert.
Must be at least 1
characters long
"response_metadata"
Assert that a query took less than a certain time.
The maximum amount of time that a query took. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
The minimum amount of time that a query took. Defaults to 0.
Value must be greater or equal to 0
"query_elapsed_time"
Assert that a messages response has a certain type.
If True
, the message types can be in any order.
If True
, all message types should be present.
The message types that we want to assert.
Must contain a minimum of 1
items
Must be at least 1
characters long
"message_type"
Assert that one or more regular expressions match.
Whether the regexes should be case sensitive.
Whether the regexes should be matched in order within the message text. We assume regexes
first followed by substrings
. Defaults to true
.
The minimum number of regexes or substrings that should match message text. If set to all
, it must match all the regexes. Defaults to all
.
Value must be strictly greater than 0
"all"
The list of regexes for the assert.
No Additional ItemsThe list of substrings for the assert.
No Additional Items"regexes"
Assert that one or more regular expressions don't match.
Whether the regexes should be case sensitive.
The minimum number of regexes or substrings that should not match message text. If set to all
, it must not match any of the regexes. Defaults to all
.
Value must be strictly greater than 0
"all"
The list of regexes for the assert.
No Additional ItemsThe list of substrings for the assert.
No Additional Items"inverse_regexes"
Assert that the occurrences of regular expressions that match are in a certain range.
Whether the regexes should be case sensitive.
The number of occurrences of regexes or substrings that match message text should be equal to this. If this is set, minimum
and maximum
will be ignored. Defaults to None
.
Value must be greater or equal to 0
The maximum number of occurrences of regexes or substrings that match message text. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
The minimum number of occurrences of regexes or substrings that match message text. Defaults to 1.
Value must be greater or equal to 0
The list of regexes for the assert.
No Additional ItemsThe list of substrings for the assert.
No Additional Items"regexes_count"
Implements AnswerRelevancy from Deepeval: https://docs.confident-ai.com/docs/metrics-answer-relevancy
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"answer_relevancy"
Implements ContextualPrecision from Deepeval: https://docs.confident-ai.com/docs/metrics-contextual-precision
The expected output for the query which we will evaluate against.
Must be at least 1
characters long
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"contextual_precision"
Implements ContextualRecall from Deepeval: https://docs.confident-ai.com/docs/metrics-contextual-recall
The expected output for the query which we will evaluate against.
Must be at least 1
characters long
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"contextual_recall"
Implements ContextualRelevancy from Deepeval: https://docs.confident-ai.com/docs/metrics-contextual-relevancy
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"contextual_relevancy"
Implements Correctness from Deepeval: https://docs.confident-ai.com/docs/guides-answer-correctness-metric#:~:text=Answer%20Correctness%20(or%20Correctness)%20is,0%20indicating%20an%20incorrect%20one.
The expected output for the query which we will evaluate against.
Must be at least 1
characters long
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"correctness"
Implements Faithfulness from Deepeval: https://docs.confident-ai.com/docs/metrics-faithfulness
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"faithfulness"
Implements Hallucination from Deepeval: https://docs.confident-ai.com/docs/metrics-hallucination
The context to evaluate hallucination against.
Must contain a minimum of 1
items
Must be at least 1
characters long
Settings for the evaluation LLM to process LLM metrics.
Context size aka context length
Template for the context
Name of AWS guardrail if applicable. Should start with arn:aws:bedrock:us-west-2:.
AWS Guardrail version (string) if applicable
Max tokens returned by LLM
Name of LLM as per HuggingFace
Must be at least 1
characters long
Similarity score cutoff for node postprocessing based on reranker score
Similarity score cutoff for node postprocessing based on embedding score
Number of nodes to return after retrieval
Restricts bot to only answer only these languages. Provide list of 2-letter language codes, and double-check that Amazon Comprehend / Translate supports them.
No Additional ItemsPrompt for the LLM
Temperature
The maximum metric score required. Defaults to None
meaning there is no upper bound.
Value must be greater or equal to 0
Additional metric keyword arguments that you supply to the metric class
Additional Properties of any type are allowed.
Type: objectThe minimum metric score required. Defaults to 0.
Value must be greater or equal to 0
"hallucination"
The description of the test case. This can be used to describe the motivation for the test case.
The name of the test case. Defaults to the query.
The query for the test case, which includes actual message and user profile.
The profile of the user executing the query. If specified as a string, it will attempt to load the predefined profiles.
Profile for Jarvis V1.
The department of the profile user
The designation of the profile user
The organization key associated with the hospital profile.
Must be at least 1
characters long
The unique identifier for the profile.
Must be at least 1
characters long
Profile for Jarvis V2.
Profile metadata associated with a hospital user.
Profile metadata associated with a hospital user.
The institution that the user belongs to. This is usually similar to the tenant
The tenant of this profile.
The cleo
app tenant applicable to this object.
^cleo\:[a-zA-Z0-9][\w\-\_]*$
The hospital
app tenant applicable to this object.
^hospital\:[a-zA-Z0-9][\w\-\_]*$
Additional Options regarding the session related changes, like overriding certain values, etc.
The text of the query.
Must be at least 1
characters long
The test case ID. Note that this is usually set during loading of the test suite.
The tenant for which the docs come from/are to be answered.