Create "agents" to deal with bugs and alert people about them
Rationale:
We have lots and lots of bugs filed on a daily basis, bugs that could, given a set of heuristics, be automatically triaged. By triaged we mean flagged as important/
Goal:
Have a script or a set of scripts that automatically do the first round of triage, removing useless or not likely to be useful bug reports, and raising the priority of the potential issues whenever they happen. Remove the need of worrying about triaging all bugs and let us focus on fixing the important ones.
Blueprint information
- Status:
- Started
- Approver:
- Rick Spencer
- Priority:
- Essential
- Drafter:
- Ursula Junque
- Direction:
- Approved
- Assignee:
- Ursula Junque
- Definition:
- Approved
- Series goal:
- Accepted for quantal
- Implementation:
- Started
- Milestone target:
- quantal-alpha-3
- Started by
- Kate Stewart
- Completed by
Whiteboard
UDS Discussion Points:
- What is the expected action of the agents?
Rick suggests the bugs considered potential non-issues to be automatically triaged and set Low, and bugs that are potential problems should be assigned to the team that's responsible for the affected package. We need to find consensus here about what would be a good policy to make people aware of issues that should be looked at, removing the need of people triaging "unimportant" bugs. In this subject: how are people going to be made aware of issues? Rick also suggests we send email with summaries of issues to be looked at, instead of using bug mail.
Summarizing, we want agents to:
- Mark importance of bugs, given the gravity value;
- Change status of bugs, if we agree that New/Undecided applies only to brand new issues that not even the agents checked yet;
- Assign bugs with a certain gravity to their respective teams, like a first step triage to grab people's attention to issues that should be dealt with.
- Send email about ongoing crisis:
* Packages with bug spikes in the last hour/day;
* Bugs with lots of new dupes/affects me too in the last hour/day;
* Spike on "duplicate" crashes without bugs happening in the last hour/day.
For that to work, we need to add a way to mark the bug as looked at to "quiet" the agents alerts.
- What is the criteria to consider a bug important?
I've been working on two concepts to be applied to bugs: gravity (or graveness) and "fixableness". For the agents to be able to tell which bugs are more likely to be real problems, we need to define the criteria to consider a bug as must-look. We're running an experiment right now calculating gravity given who is the bug reporter, number of affected people, tags (regression, current release, etc), number of duplicates and so on. This needs to be discussed to understand each team's criteria.
Gravity criteria heuristics list:
- Number of affected people;
- Number of duplicates;
- Number of subscribed people;
- If reporter(s) is Ubuntu Dev/Bug Control team member/has Canonical email address (?)
* this doesn't take into account if the reporter of a duplicate is in this team
- If it has some tags (cumulative):
* apport-bug
* apport-package
* apport-crash
* apport-kerneloops
* regression-release
* regression-proposed
* regression-update
* current dev release (precise, quantal, etc)
* iso-tracker
- Degrade gravity over time
* if the bug is not receiving recent duplicates
- Whoopsie-daisy crashes (number of crashes per bug raises gravity)
- Current importance: Critical and High get more points to be in the top of the bug list;
- IDEA: bugs reported against packages that are part of a "core" list (image packages, maybe?), because they affect more people (in theory)
User Stories:
Ursula does daily triage of desktop and server bugs, and this takes a considerable time. She needs a tool to do initial filtering on issues, so she can spend time going through a list of bugs with potential.
Ursula is responsible for looking at bugs across several packages. She needs to be notified whenever a potential issue shows up, like a bug that's been affecting lots of people in the last hour and needs action, instead of having to manually look for such issues or wait for someone to raise it.
Kate needs to see when there are release critical bugs, and see what has shown up over time range.
A developer is notified about a bug, launchpad knows who has accessed bug and the bot should not renotifiy the person once the bug has been accessed by that person.
Assumptions:
- For the agents to be implemented, we need to:
* Agree on the criteria used to calculate the gravity.
* Agree and close the relationship between packages and teams, leaving no gaps behind on important packages.
- There are other scripts that currently change bugs status and importance. We need to figure out which ones are those and do something about it so changes won't collide. Or we make the agents don't touch things that other scripts touch (with risk of having to change our workflow/adapt to current workflow instead of creating one that's "ideal"), or we agree to disable those and have all changes coming from a single bot.
Current "bots":
* Launchpad - it changes a bug from New to Confirmed whenever a bug receives a duplicate or a person marks that as "Affecting me too". Are we keeping this behavior or should we ask Launchpad to disable that?
* Apport-retracer marks a bug as Medium importance if it is a valid apport-crash.
* Apport-retracer marks a bug as Invalid if it the retrace fails
* Apport-retracer does not modify (set importance) of python crashes
* Bugs with 10 or more duplicates get tagged bugpattern-needed
* Bugs are tagged "patch" if patch found
* Bugs are tagged needs-packaging and marked Importance:wishlist if they have [needs-packaging] in the subject
* Kernel Team Bots:
The kernel team bots use a very light weight plugin architecture to allow new conditions to be easily added to them.
* "New" bot - looks at all new bugs that have come in since the last run of this bot and checks to see if all the required log files have been attached. If they have been, the bug is marked "Confirmed" if not, a request for the log files is added to the bug and the bug marked "Incomplete". If a new bug is a "Package" BugType, the bot looks at one of the log files and filters out lots of the noise and adds a comment about what it thinks is the error messaged.
* "Nag" bot looks at all "Open" bugs and sees which ones are filed against the current development kernel. If the bug is, and if there is a more recent kernel released that the one filed against the user is asked to update to the newest kernel and indicated if the bug still exists or not.
Other ideas:
* add filter on specific releases
* alert developers if a patch appears on a bug
* bugs already get tagged patch if they have patch added to them
* alert developers if the bug has a bug watch that is fixed upstream
* 2 kinds of agents
* there is an issue here that needs to be fixed
* there is a fix for an existing issue
* need a way to quiet bugs so that alerts aren't sent about them
* bryce thinks that adding comments is a good way to do it
* laney thinks implicit action should quiet
* an agent should only tell you about a bug once
Agent 1
1. developer uploads a package
2. they are automatically enlisted as an agent's target user
3. bugs for the version of the package and the release that you uploaded to
4. whenever a bug that exceeds a certain gravity you get pinged
* certain uploads, such as no-change rebuilds should probably be ignored
Work Items
Work items:
[ursinha] Change gravity so that it takes age (and other factors listed in the gravity heuristics list into account (old bugs don't have high gravity, fresh bugs do): DONE
[all] Follow the list of bugs "selected" by gravity for ~ a month, check if the heuristics are working, if we're missing other issues important as well: DONE
[brian-murray] Hack in the retracers to point to the newest release bug if it's a duplicate of an old bug (add the latest release to the tags of the master bug report): DONE
[brian-murray] copy latest release tags from duplicate apport-crash bugs to the master bug: DONE
[brian-murray] make the apport-retracer set the importance of python crashes to medium: DONE
[brian-murray] set existing python tracebacks in Launchpad to Medium: DONE
[komputes] Create "agents" to deal with bugs and alert people about them: POSTPONED
Dependency tree
* Blueprints in grey have been implemented.