I’m pretty familiar with automated tests where you’re comparing a received value to an expected value (e.g. basically all unit/integration tests) — in a CI/CD workflow, you handle test failures by failing the whole pipeline, and then that commit/PR/etc has a pipeline that failed next to it.
However, what if I have some kind of “performance” measure I want to track, instead? Something that isn’t pass/fail, but rather a set of experimental results over time? (e.g. speed of responses from an API, wins/draw/loss rates on chess bot, confusion matrix scores for a classifier, etc.) Is there a tool that can show that kind of “automated experiment” results in order by git commit, pull request, etc?
I thought about sending the data to some kind of data store with a Grafana front-end, but I was hoping there might be some less “diy” method for creating such a display.
I use datadog for this specific use case. You can log your own metrics through their API, then set up dashboards and alerting based off specific parameters and thresholds. I mainly use it to track web vitals over time to pinpoint problematic releases or assets, but it can be used for any numeric values you wish to track.