A Brief History of NetKernel

on extracting release data over nine year period

I’ve just finished working on exposing a complete change history for NetKernel for the period from the release of NetKernel 4.0, nine years ago. Previously our mechanism for publicly exposing this information has been solely through the, once weekly, newsletters. However that medium is unstructured which meant it was hard to get an overview of changes and check for specific releases for any particular package.

Since the release of NetKernel 4, we introduced a repository approach to releasing updates to NetKernel. We called this Apposite. This approach is based upon the Debian Linux Apt repository. The idea is that each distribution connects to its own specific repository. The repository contains packages that can be installed or updated. That way you don’t need everything installed for everybody which keeps distribution sizes down, it also allows a seamless update cycle without needing to download and re-install the whole of NetKernel. Of course periodically we roll up all the cumulative changes that have occurred and ship that as a new distribution.

I plan further work to integrate this complete change history better into the Apposite repository manager UI too. At the moment I see the lack of timestamps on packages, and full and easily accessible change history for each package a hinderance. As a user all you can really do is trust us and accept all changes. I order to aid this future work i’ve exposed a REST API as well as a formatted HTML page.

Change History Page

Extracting this data proved to be more difficult than I first anticipated. Our internal build system Bob, which of course is implemented on NetKernel(!), keeps a database of everything in Postgres. Peter had the skills to generate me a big join SQL query that was good starting point. However a certain amount of data munging was needed as only module builds have a timestamp. So it is necessary to follow those modules through into the packages in which they were released and group them into release dates.

After getting a good list of all releases with their associated released packages and modules, I observed that a lot of packages didn’t have any comments associated with them. There are no good excuses for this - but if I was looking for one i’d say the UI on the backend of the release process is not the most refined. Anyway not accepting excuses I pushed on with a tricky data integration exercise and attempted to extract all the Subversion - yes we still use Subversion internally1 - commit comments. After a lot more munging I finally had a process that generated the raw data.

Structure of the raw data

The XML exposed in the REST API is actually serialised HDS - hence the HDS namespace. This allows for proper parsing back into primitive datatypes rather than just strings like XML.

The structure is a tree of releases at the root, with nested packages inside. Each package then contains a list of modules that were updated along with all the comments associated with that module.

Raw dates are using the standard convention of milliseconds since 1st Jan 1970 as per Java System.currentTimeMillis().

<?xml version="1.0" encoding="UTF-8"?>
<releases xmlns:hds="http://netkernel.org/hds">
	<release>
		<dateRaw hds:type="LONG">1534519939037</dateRaw>
		<date>Fri, Aug 17, 2018</date>
		<packages>
			<package>
				<id hds:type="INTEGER">3107</id>
				<name>nkee-layer0</name>
				<version>1.6.1</version>
				<dateRaw hds:type="LONG">1534519939037</dateRaw>
				<date>Fri, Aug 17, 2018</date>
				<modules>
					<module>
						<identity>urn:com:ten60:core:ee:layer0</identity>
						<version>1.7.1</version>
						<comment>Fixed memory leak in External Request Profiler</comment>
					</module>
				</modules>
			</package>
			<package>
				<id hds:type="INTEGER">3106</id>
				<name>nkse-dev-tools</name>
				<version>1.77.1</version>
				<dateRaw hds:type="LONG">1534519778288</dateRaw>
				<date>Fri, Aug 17, 2018</date>
				<modules>
					<module>
						<identity>urn:org:netkernel:ext:introspect</identity>
						<version>1.64.29</version>
						<comment>Added link to directly execute a save scripting playpen script</comment>
					</module>
				</modules>
			</package>
			<package>
				<id hds:type="INTEGER">3105</id>
				<name>nkse-control-panel</name>
				<version>1.50.1</version>
				<dateRaw hds:type="LONG">1534519583345</dateRaw>
				<date>Fri, Aug 17, 2018</date>
				<modules>
					<module>
						<identity>urn:org:netkernel:nkse:style</identity>
						<version>1.41.15</version>
						<comment>Fix to exception formatting to better distinguish implicit transrept resolution failures (rather than showing causing request)</comment>
					</module>
				</modules>
			</package>
		</packages>
	</release>
</releases>

  1. Subversion actually fits the structure of NetKernel much better than modern alternatives such as Git. The reason is simple; SVN allows you to branch sub-trees of your repository, and when working with large numbers of individually versionable modules, it makes more sense for them to stay in one repository, rather than one repository per module. [return]