Pilot Talk Eval

Unclassified

Instructions

This app evaluates an LLM's ability to produce correct pilot jargon
The brevity codes are from SD Brevity 2025 (approved for public release)
The model is Claude Haiku 4.5 with brevity codes in its context window
Use this app in one of two ways:

Either provide a plain text example scenario to the model (without brevity codes), then rate the models abilty to produce a correct brevity code response
Or provide an example scenario exclusively in brevity codes, then rate the models abilty to produce a correct text description

Rate the models response from 1 to 5 stars and in the feedback box, explain what the correct response should be
Do not enter CUI data
Click Record Data to save your evaluation to the SQL database for analysis

Unclassified