So what does open source mean, really? I would like to skip the legal part and review the community side in this post. What defines and makes a succesful open source project? Why have projects such as Linux, Firefox, Apache, MySQL, Postgres, Plone etc. changed the world to an extent that 20th century IT gigants such as Microsoft now have to change strategy and claim to be committed to openness?
There are a number of great reads that go into these questions in depth. The classic read is of course Eric Raymond's essay The Cathedral and the Bazaar. I also like the article The Transformation of Open Source Software by Brian Fitzgerald et al, written in MIS Quarterly in 2006, and this study on Plone. Fitzgerald emphasizes a difference between FOSS ('Free and Open Source Software') and what he calls OSS 2.0, 'a more mainstream and commercially viable form' of open source software. He makes the argument that contrary to the grassroots FOSS efforts, in OSS 2.0, open source is used as a deliberate product placement strategy by commercial vendors. If you have a good open source product and you can activate a community of developers around it, chances are you get more back from the community than you put in as a company in the first place. And best of all, you will be able to make money by provide commercial services around it, such as Service Level Agreeements. The community also gains from that, because you 'commit' improvements back into the community project. This is by now a proven business model, as I pointed out in my last blog post.
So what can we learn from all this for bioinformatics? First of all, if you want to create something lasting for the community, for heaven's sake, make it open from the start! I really don't get the projects that - sometimes even with public funding - fence off development efforts 'because we might make money with it later' or 'we have to finish it first'. This is a definite upfront guarantee for failure. First of all, because you do it in your own little corner, the world simply doesn't know about your project and will just ignore you when you suddenly claim to have the perfect solution. Building a community and making a name takes time, and if you are not doing that from the very start, forget it. Second, you miss possible interactions with the community out there during the most valuable time - during initial development. That is the time that you want to know that a part of what you need to build already exists, or if there are best practices for something you don't have experience with yet, or if someone has the same needs and is very motivated to join your efforts! Finally, about 'making money', business is not that easy. If you want to make money on your code, you better have a detailed agenda, business plan and earnings model for that upfront, or you are bound to have overlooked some very obvious things - that you forgot to patent some crucial IP, or that there is actually no market at all for your great service.
I am not saying 'valorization' of software or services developed in public-private collaborations is a bad idea. I think it is great. If you get public money to do science and business development, you should think about how to give something back to the taxpayer and the companies involved. Which can be to start up and help a spin-off company to develop a great service. If you are able to improve public health, or generate wealth - these are kind of things taxpayers would want out of such projects. But it can also be that you help science and biobusiness in a more indirect way, by creating a great open source community that allows scientists to concentrate on science instead of IT, or entrepreneurs - the people that see opportunities and are willing to take risks - to build services on top of it.
So let's consider for a moment how we are doing with the projects that The Hyve is involved in, such as the Phenotype Database. Can outside developers find and re-use the open source software that we built? I am afraid we have much to improve. The source code, documentation and manuals are all scattered around on different websites - NBIC Trac, DbNP, NuGO etc., which is horrible. Also, there is no obvious way to join and interact with the community. And on the subject of API's - yes, we have some, but they are not fully REST and also still meager in functionality. But at least we 'eat our own dogfood' (use our own API's) in the sense that Phenotype Database modules communicate with each other for 100% using the documented HTTP JSON messages.
The good news, however, is that at a recent meeting in Wageningen, with scientific directors from NBIC, NMC, NuGO, NNC, and representation from NTC and a number of other parties, we decided to start a Phenotype Foundation. One of the first things the Phenotype Foundation will do is create a clear, one stop place on the web where we host, store and communicate about the involved open source projects. The Phenotype Foundation will resemble other open source software legal entities such as Apache Foundation and Plone Foundation. But interestingly, the Phenotype Foundation will also include a user community of biologists. Because unlike products like Apache and Plone, the end users of our products are not developers, but biologists. It is always very important to involve the community of end users in the development. This is an experiment, and a major difference with projects like Apache which are 'from developers for developers'. But I'm confident we will have impact, and certainly learn a lot in the process! If you are willing to be involved or informed, please drop me a line (kees at thehyve.nl). Unfortunately, until the website is up, which will be early 2012, direct engagement with community members via e-mail or phone will be the primary contact route.