The time I spent in Paris with Pietro Abate was very fruitful. I have to thank him and Roberto di Cosmo for inviting me and even compensating for my travel expenses.
The things we actually managed to implement during my visit:
- removed huge chunks of code that were not needed anymore, making everything more concise and pretty: ended up removing over 1600 lines
- basebuildsystem.ml can now fill add-cross-sources.list with sources for debhelper (as debhelper is quite build-essential)
- start evaluating Gentoo as a source for reduced build dependencies
- add unit test skeleton and material
- compile with dose3 master
- add graphML output
- feed graphs into analysis tools for visualization
The two most important things (in my opinion) that we came up with for future implementation, were the idea to harvest reduced build dependency information from Gentoo as well as finding a flaw in the way the current dependency graph relates to binary packages and their installation sets.
I am still busy with evaluating output from my trials with Gentoo, so I will cover this topic in a later blog post once I generated the actual impact on the dependency graph. The current status is, that Gentoo USE flags allow me to find possibly droppable build dependencies for 250 out of the 350 interesting Debian source packages that are part of the main scc.
Pietro also found a current flaw in how the dependency graph is generated. While source package A and B might both depend on binary package C (and its installation set), it is wrong to add a dependency to C from both source packages without further verification. Due to virtual packages and disjunctive dependencies, C might have many possible installation sets. Only one of them is chosen in the current code. The problem is, that this one chosen set might conflict with the other build dependencies of source packages depending on C. Therefor, there must exist multiple binary package nodes C, each with a different installation set, dynamically generated as they are needed. Source packages must point to the node for C that possesses an installation set that doesnt conflict with its own build dependencies.
Other TODO notes that we came up with and that I will be implementing are:
- integrating dose3 as a git submodule
- create a proper build system
- try using a different cudf solver
- unit tests
- finally coming up with a name (suggestions welcome - I'm bad at name-finding)
- building a Debian package (depends on having a name first)
- formalize/visualize/document the current algorithms
- to break the main scc, use additional heuristics like:
- the order induced by reduced_dist to classify nodes in the graph
- centrality/distance in graph
- comparing scc of different Debian snapshots with each other
As I cannot always depend on new dose3 versions being pushed to Debian Sid right after their release, I will do the dose3 git submodule integration over the weekend. This will allow me to evaluate the results I got from my evaluation of Gentoo USE flags that I gathered over the past week.