ATC Pilot QA Dataset for LLM Evaluation
As AI get's implemented in all sorts of areas in the industries the need for safety checks are paramount. LLMs, while extremely powerful, can produce unreliable outputs, especially in safety critical domains such as Aviation where accuracy and consistency is paramount. This dataset seeks to provide a standardized benchmark to assess an LLM's ability to interpret and respond to aviation related queries and participate in aviation related applications. This dataset seeks to help mitigate the risk with deploying unreliable and unsafe AI.
Include images, code snippets, and any other relevant information.