Performance wise what are trade-offs when running an MLX-backed model on-device compared to using the system's AFM Core model? Also semiconnected: How do I use the 'model judge evaluator' to compare the accuracy of a custom LoRA adapter against the system's private cloud compute models?
The ModelJudgeEvalutor is used to evaluate a response where the score is subjective - e.g. "is this a good explanation".
There are some good examples here:
https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators
And in the sample code project:
You can use it to evaluate responses from the PrivateCloudComputeLanguageModel. You just need to set up your LanguageModelSession correctly:
let session = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
let response = try await session.respond(to: "Analyze this document...")