Tuesday 6 November 2012

minutes from OctConf2012: pkg - package system and structure

There's an Octave code sprint planned for the weekend of 17-18 of November with the purpose of improving the functionality of pkg(). Some of the improvements were discussed at OctConf2012 but more have since been discussed on the mailing list. At the time, I started writing a long report (in the style of a meeting minutes) about pkg() and Agora, the things that had a bigger impact for Octave Forge but only finished the part of pkg(). The comments I received were that the text was too long and detailed so I ended up writing a shorter text that covered both items.

But with the code sprint coming up soon, we do need a proper document stating what pkg() should be doing. When we looked at it during OctConf, things were intertwined in such a way that any changes required fixes everywhere else. My guess is that if a bunch of people start coding away on it at the same time, even if on different problems, we will keep stepping on each other toes. And if we do create 1 branch for each sprinter, merging them back together might not be so easy. The ideal would be to have something like a python's PEP.

In the mean time, I'll post here the minutes of the pkg() discussion during OctConf2012.



These were first dicussed between Carnë Draug (CD), Juan Carbajal (JC) and Carlo de Falco (CF) before being presented to the community present at OctConf2012 on the morning of July 19 for further discussion. During the rest of the event CD, JC and CF continued discussing the plans whose conclusions are now reported.

It was the opinion that the current problems with the pkg system are caused by the code complexity of pkg(), itself caused by the path of its development, slow, as new features were added as they were needed, one at a time on top of the previous ones. Also, the nature of the problem, mostly string processing and directories content, is not solved with the Octave functions with clean code. As such, a list of problems with the current system and new desired features was made to have a clear design of what the system should support.

It was proposed by CD to rewrite pkg() in Perl. Despite the language fame for being hard to read, it would allow for shorter and fast code. It would be much easier to maintain by someone familiar with Perl than the current code is for Octave programmers. Even for someone unfamiliar with Perl, it should take less effort to fix bugs. Plus, perl is already an Octave dependency so Octave users will already have it installed on their systems. CF pointed out it is just a building dependency and therefore not necessarily present on the user system. While it is true that pretty much all Linux distributions require perl, it does not hold for Windows. pkg() is currently faulty on Windows so it wouldn't be a problem but the hope is to make it work for them too. The idea to use perl was then rejected.

CD, CF and JC were of the opinion that the autoload option was not good and that pkg() should not support it. Packages can shadow core Octave functions, and even other packages functions. On the later case, no warning is given. Code meant to run with specific packages, or even with no packages at all, may catch others by surprise. Also, some users are not aware that some functions they use come from packages. Forcing them to load a packages as needed will make them know what they are doing. No other programming language has packages, modules or libraries loaded by default (with the exception of special cases such as python implementations). JC gave the example of a practical class where the teacher gives commands for the students in front of their pre-installed octave systems. The first command they should run should be pkg load and the professor should not have installed the package with autoloading by default. Any user would still be free to configure his .octaverc file to load packages at startup. That is the objective of .octaverc not of a package system, to configure startup of octave sessions. CD pointed that loading of packages is also not completely safe at the moment. When loading a package, its dependencies are loaded at the same time. However, these dependencies can be unloaded leaving those dependent on them loaded and not issuing a warning. The discussed options were: unload all other packages at the same time, refuse to unload the package, keep the current behaviour. The verbosity level for attempting to unload such package was also discussed but no conclusion was reached.

A frequently requested option is to automatically identify, download and install all package dependencies. All CD, CF and JC agreed that this should be implemented. It shouldn't be too much work since the dependencies are already being identified and can be downloaded with the -forge flag. All code is already in place, it should only require a few extra lines. This is obviously only possible for dependencies on other packages. A check on the availability of external libraries and software can be performed with Makefiles but pkg() can't solve them.

CF suggested to add two more options to the install option that would allow installing a package given a URL and another to install the development version of a package. As with the option to automatically solve dependencies, and for the same reasons, it should be easy to implement the URL. CD argueed that the dev option should not be implemented because it would stop packages from being released as users become more used to it and start installing possibly broken packages. CF said it would still be very useful for package maintainers preparing a new release. JC suggested to use releasePKG() on the admin section which already does it. It requires for a local clone of the repository which should already be available if it is for a developer preparing and testing a new package release. It was agreed that the url, but not the dev option would be added to pkg().

CF and JC were of the opinion that the package system should not support both local and global installs and that all installations performed by pkg() should be local. CF reported that on Mac systems global installs were made local even when running octave with sudo. CD mentioned that on Windows systems the opposite happens, and all installs are global (such being checked with isppc() on the code). The two types of installations are exclusive to Linux systems. CF and JC said that global installs should be kept for the distribution maintainers and pkg() should deal with local installs only. CD argueed that this would mean that system administrators, even the ones compiling the latest octave version, would be dependent on their distro maintainers for obtaining the latest version of packages. CF and JC replied that supporting both types complicates the system and that packages are more user specific. It was agreed that the option was then going to be removed. After discussing this option with Jordi Hermoso, it was discovered that at least the Debian maintainers actually use pkg() to prepare the packages. It was then decided that pkg() would deal with both installation types.

All CD, CF and JC were of the opinion that the -local and -global install flags were still useless and should be removed since the type of installation was already being decided based on whether the user was root, this flags only useful to force the other type.  CD proposed changing the default for a global installation if there was write permissions rather than being root as to permit an octave system user to make octave global installs. This also allows for a local installation of octave (a regular user compiling and installing octave on its home directory for example), to make a global package install. Global relative to the local octave installation, the packages on the octave tree rather than on a separate directory. This should allow for a cleaner organization. These two changes were made and commit before the end of OctConf2012.

The current list of installed packages, local and global, is a simple struct saved in the -text format. CD was of the opinion this should be made a binary format to discourage users from manually editing the file and accidentally breaking the system. CF argued the opposite, that such editing may be necessary. It was decided to simply leave a warning on the file header.

CD noticed that it is not possible to use packages in a system that has more than one octave version installed. While .m functions will work normally, .oct files compiled at installation time are version specific and will therefore fail. These are placed in API specific directories to avoid their incorrect loading but reinstalling the package removes them, forcing a reinstallation of the package everytime a different octave version is to be used. CF also pointed out that a system to perform reinstalls should be made and the packages source kept so as to reinstall packages with new octave versions. CD noted that this would also allow for use of %!test of C++ functions after install. Similarly, it was noted that currently is not possible to have more than one version of the same package installed.

List of conclusions:
  • dependencies on other packages should be automatically solved
  • pkg() will not load packages automatically
  • an option to install packages given a URL will be added
  • the source of installed packages will be kept in disk for times installations
  • it will be possible to have multiple package lists that can be merged or replaced
  • support for different packages version and different octave versions will be added
  • pkg() will stay written in the Octave language
  • the -local and -global options will be removed
  • a header will be added to the octave list files warning that they should not be manually edited

Saturday 27 October 2012

less holidays to code for Octave

Octave Forge has many unmaintained packages. Way more than the official list. Actually, the most unmaintained packages are listed as maintained since no one has even bothered to even update their status.

Still, unmaintained packages receive the occasional bug report, sometimes with a patch. The latest was for holidays() from the financial package. These are the best. It means that not only someone is using the code, but also that they care enough to try and fix it.

On the opposite side of the spectrum there's things such as this commit (I'm actually a bit ashamed of it). It introduced a huge bug that almost anyone using xcorr2() should have noticed immediately. But no one did. I mean, it was not issuing an incorrect result or anything difficult to notice, it was giving a very noticeable "error: invalid type of scale none". It made xcorr2() almost useless. But it was released with signal-1.1.2 (2012-01-18) and fixed only 8 months later without anyone ever complaining.

Anyway, back to the bug in holidays(). I applied the second patch from the reporter. Even though I don't care about this function at all and the patch did fix this problem, I spent some time looking at the code. Not that it was complicated, quite the opposite, but I had never dealt with dates in Octave before. And I learned something new.

This function, kind of returns the dates of holidays that close the NYSE (New York Stock Exchange). What I did not knew was that when a holiday falls on a Saturday or Sunday they are shifted to Friday or Monday respectively. Thus I definitely did not knew the exception to this rule. When the shift would move the holiday to another year there's no holiday at all (the only case of this is when New Year's day falls on a Saturday). And that was exactly the origin of the problem.

A quite esoteric issue for someone like me. It is fixed now. Matlab compatibility, documentation and performance were increased, new tests were added, some of my hours were lost, and new useless knowledge was gained. Unfortunately,  according to the fixed holidays() people working at the NYSE now have less holidays to code for Octave. I'm sorry. And I should probably be writing my thesis instead.

Tuesday 21 August 2012

Octave and underage drinking

Octave is just like good wine. It's in constant active development and has a good ageing potential. Using Octave today is a better experience than using it 2 years ago, and from what I have seen, it will be an even better experience later.

However, unlike wine, you can be of any age to drink it. It's interesting to notice how with the help of Google Summer of Code and ESA Summer of Code in Space Octave got new students involved on it, some of them under the legal drinking age. If you are following planet Octave you probably read some of their posts already. Does this mean that Google and ESA support underage drinking? Of course this is variable between countries and the students in question are all legal drinkers on their own countries. But the nature of Octave is different, it can be tried on the internet where it's hard to decide what rules apply. And so, Octave also brings up curious sociological questions, just as the new world brings down the old concepts of geographical limitations.

If you hang around #octave on irc.freenode.net, you will discover that it can be quite a social activity too. And OctConf2012, a physical congregation of Octave connoisseurs, was just a month ago in Montreal, Canada. And even others have show up, people new to the flavour learning how to have the best taste of it.

But it is not an exclusively social activity. Octave respects your comfort zone. Just like a bottle of wine or whisky, you are free to enjoy it on your own, laying on your best chair with Pink Floyd on the background.

Finally, Octave is also completely odourless and your breath won't give away what you have been up to. You'll get no angry or disappointed words or looks from your parents, partner, children or pets.

It doesn't matter on young you are. Stop giving grey hairs to your family, join the Octave developers in the ageing process. Come drink with us and enjoy Octave responsibly.

Thursday 2 August 2012

OctConf2012 - Agora and pkg()

Two subjects were discussed at OctConf 2012 with direct impact on the Octave-Forge project: Octave's package system, and Agora. While very little code was written for any of these two, there was plenty of design and discussion as different ideas and visions clashed.

Most of the Agora design, current package system and possible package structures were presented on the morning of July 19. However, the intended changes to pkg() were not, and neither was how they would affect Agora. These were discussed after, mostly between Carnë Draug, Carlo de Falco and Juan Carbajal, and until the very last minute of OctConf.

Agora

Agora is a project meant for rapid collaboration of Octave code which has been under very slow development under the last 2 years. Its name refers to the ancient Greek Agora, a cross between place for social gathering and marketplace. Would that we had more web developers in our community. It is currently available at agora.octave.org but is still only a pastebin with syntax highlight for Octave.

The presented design would split Agora in 3 different sections: single, bundle and Forge, the later absorbing Octave-Forge which would cease to exist. Single and bundle (these are development titles) are very similar in nature, a cross between MatLab's FileExchange and arXiv. The Forge section, not unlike what is currently Octave Forge, would hopefully become smaller and easier to maintain as some of its code moves into the 2 other sections.

Single and Bundle

These can be used by anyone to make their code available to others, their only difference the upload method. While single is meant for single function files, and will present the user with a text box to paste the code, bundle will upload an archived file.

Each of them will have its own page with a download count and where users can rate, leave the comments, and contact the author. They can also be organized with multiple tags (e.g., statistics, bioinformatics). To avoid spam and copyright infringement, there will be a flag button to bring the attention of moderators. Other than that, there should be no moderator interaction needed.

They will be associated with a specific user, the uploader, who is able to release new versions. Versioning will be automatic and the simplest possible: a single number incrementing with each new upload. Old versions will be made available for download in the same style as arXiv (see the submission history on an entry for an example).

Bundles can either be a simple collection of files or a properly structured Octave package. If a package is meant to be uploaded, a simple structure check can be optionally requested by the uploader. This would be made by a script and there will be no guarantee that it actually installs, only that it looks correct. There will be no moderator interaction.

Problems, bugs and comments on the single and bundle sections are encouraged to be submitted to the uploader, not to the Forge or octave help mailing list.

Forge

This section would be what is currently Octave-Forge. The hope is that by dropping the Octave name there will be less confusion between the Octave and Octave-Forge projects. This section will aggregate packages that are actively maintained and developed by the community.

There will be a single bug tracker, each package being a bug category, a single mailing list, but a mercurial repository for each package. The Forge repository will be another mercurial repository where each package is a subrepository.

Packages in this section will comply with the following:
  • have at least one package maintainer;
  • install and work with the latest Octave release;
  • released under a GPL compatible license;
  • not dependent on a non-GPL compatible libraries or applications;
  • all functions (except private) must be documented in TexInfo;
  • if a doc section exists it must be written in TexInfo;
  • a NEWS file must exist listing changes for each release.
It is also recommended that they comply with:
  • no shadowing of Octave core functions;
  • no direct inclusion of external dependencies.
Once this system is in place, new code submissions will be directed to the single and bundle sections. As these are rated and improved over time, if a forge maintainer wishes to include it on its own package he can do so.

Snippets

The current function of Agora as pastebin will be also be kept as its actually
pretty useful.

pkg()

Some problems with the current pkg system were discussed as well as desired new features. Also other features were decided more harmful than useful and will be removed. These are:
  • removal of the autoload option. No package will be able to automatically load itself and its value on the DESCRIPTION file will be ignored. This prevents users from inadvertently shadowing functions (even from other packages) and will increase aware on the role of packages.
  • implementation of a new flag, -url, to specify URLs for a package tarball.
  • automatic download and install of dependencies if those are part of Forge.
  • keep the source of installed packages. This will allow to reinstall a package when Octave is updated as well as run the tests on C++ code.
  • implementing an option to run the integrated function tests when installing a new package.
  • a new organization for the installed packages on the system. This will include the removal of --global and --local flags (which will be handled automatically) and is meant to to allow:
    • different versions of the same package
    • different versions of Octave using the same packages
    • global package installs in relation to the Octave installation, not to the system.
  • automatic build of a package documentation in HTML, PDF and info format from TexInfo formats, similarly to what happens when building Octave.
Implementation of all of these will include a major overhaul of the whole pkg() code, as many of this options are connected between them. It is not possible to implement all of them independently and each change is likely to break pkg(). As such, it was decided that their development would happen in a remote repository and merged into default once ready.