Before we begin, Cameron, I’d like to give readers a bit of background as to how I heard of you and why I think it is so important for people interested in topics such as Medicine 2.0, Health 2.0, Science 2.0 and important societal trends and Web issues in general to know who you are and what Open Science and Open Notebook Science are.
I work in the healthcare industry and one of my tasks is keeping up on developments in how clinical research is funded and conducted and its results disseminated. I have worked in a medical library, for instance, and medical librarians know about the Open Access movement. But they may not know about all of the vitality and software tool generation that is happening in the world of Open Science, activity which leads to the publication of what becomes the final product as far as medical librarians and their patrons (physicians, nurses, pharmacists) are concerned—a paragraph or two in a piece of medical literature.
I envision as the readers of this interview a wide range of people including medical librarians, medical students, undergraduates and graduate students in the sciences and engineering, technologists, healthcare industry analysts and anyone interested in what appears to be a revolution in scientific communication, a revolution that we can all welcome given that it promises to streamline many aspects of the scientific process and to facilitate communication between scientists to the ultimate benefit of everyone who will ever become ill or love someone who is. As you can see, you are a very important person in my book!
Here is how I came to hear of you. I began to write about search engines and then about Science 2.0 matters in 2008. I came to visit the Life Scientists room of FriendFeed.
Being new to the subject, I noted that I kept seeing certain names bandied about as the go-to experts on the subject of Open Science and noticed that certain people were cogent thinkers and superb writers and people who appeared to be made of nothing but stamina and brilliance, so ubiquitous was their presence and so obvious their influence among the scientists in that community. I kept seeing certain names (e.g., Jean-Claude Bradley, Michael Nielsen, Bora Zivkovic, Bill Hooker). And I kept seeing references to a certain Cameron, who seemed to be universally regarded as the man who would assess astutely to the benefit of all some significant development in the world of online science. I wondered who this Cameron was. I know much more now, having discovered your blog, Science in the Open,
your postings in FriendFeed and having viewed many of your slideshows.
Could you please tell us a bit of your background? What kind of scientist are you, for instance?
I started off in what was at the time fairly conventional metabolic biochemistry doing an undergraduate project looking at what food molecules platelets selected from plasma when given the choice. Then I moved more towards biophysics and biotechnology during my PhD, looking at ways to manipulate DNA to make what were then large libraries of variants of the gene specific protein, trying to figure out how to make protein copies of all of those genes and then select the one or two out the billions that did what we want. The theme since then has really been about developing new ways of applying physical techniques from physics and chemistry to looking at protein structure and function.
My current job at the Science and Technology Facilities Council in the UK is an interesting mix of developing new techniques, using these to tackle specific structural problems, and working with the scientists who come in to use our facilities to help them solve problems. I enjoy working with other people and this job gives me a good opportunity to do that and for that to be valued, something that is often missing in university settings.
Let’s talk terminology for a moment. I noticed that Open Science people tend not to use the term, Science 2.0. Do you think Science 2.0 is a valid term with staying power or do you prefer the terms Open Science and/or online science?
Both “Open Science” (I actually prefer “Open Research” as it is more inclusive) and “Science 2.0” are good rallying points and give a broad impression of an idea, or even a sense of a movement. I guess as a scientist, though, I find them lacking precision and I’m conscious of the ability of imprecision to lead to problems. Science 2.0, like Web 2.0, is a fairly vague term with more risks than most. Do we mean that it is version 2.0 – in which case we would be safer talking about being currently at 1.5? Is it purely about Web 2.0 tools, and is it therefore set up in opposition to Semantic Web, sometimes called Web 3.0 technologies, which I think are crucial to taking the agenda forward?
I guess, overall, I am comfortable with using Open Science/Research to refer to a movement of people who are essentially heading in the same direction. Science (or Research) 2.0 doesn’t for me capture a clear enough image to be helpful. I prefer “web native” or “using what we’ve learnt from the web for science.”
Please tell us about Open Notebook Science. Is it possible that there could arise related movements such as Open Engineering in engineering education? Are there any such programs?
Open Notebook Science is two things: a process and a commitment. The commitment is that you make your best effort to make the full record of your research available as you record it; i.e. as close as possible to as it happens. In a sense this is an ideal rather than something that is practically achievable. There are always variables that you don’t record, indeed don’t think to record, the “unknown unknowns” of research. But the point is that you do the best that is feasible with the resources you have – at a minimum making sure that the record that you use and make is the one that is available to the rest of the world. The process is how you go about making this happen. It involves some use of web-based tools to make your record and put it online, but a lot of it is just about raising the standards of your record keeping.
There is a growing movement across education to provide more of the underlying materials, lecture notes, videos etc. best demonstrated by the MIT Open Courseware initiative. These tend to be about making available materials that already exist. In the Maker and DIY communities there is a lot of interest in sharing designs and experiences of building objects and tools as well as much of the sense of playfulness that also characterizes the record of science and for me this makes a closer analogy.
Is there much interaction between the uber geeks of Open Source and the more basic science lab guys of Open Notebook Science?
Relatively little. It is also important to make a distinction between the Open Source and the Free Software communities that have different aims and philosophical attitudes. In as much as Open Source software makes it easier to be an open researcher through standards and code sharing there is a logical connection. There are also philosophical parallels – that the most effective way of working is to allow others to be involved. Logistical connections as well because the challenges involved in getting Open Source projects to work in practice are similar to the issues you start to face when people, skilled or unskilled, want to contribute (or in some cases wreck) a scientific project.
There are connections and people like Egon Willighagen are strong proponents of both approaches. John Wilbanks has recently written some nice pieces on where the analogy between Open Code and Open Research breaks down. In many ways we have more to learn from the people who look closely at developing best practice in code development. People who work hard on understanding how to document code, how to most effectively get it written and reviewed, and how to educate people to do both the writing, recording, and documentation well. Greg Wilson is a standout contributor in this area.
What led you to decide to take up a leading role as an advocate for Open Science? Was there some epiphanic moment? Or did you just come over a period of years to realize that there had to be a better way to do science given the rise of Web technologies and ever cheaper forms of computing?
Hah. Funny story, which I’ve written about. Basically it was all down to an irritating corporate firewall that lead me in a fit of pique to say we were just going to make our online notebooks completely available. I guess I was primed to think that way by the online reading I’d started to do but really my own ideas only developed after I suddenly thought “actually that’s a bit radical, I wonder if anyone else has thought of this…”. Of course they had and that led me into the writing of and Peter Murray-Rust amongst others that developed my own thinking.
Could you tell us how you go about doing the following, “I largely focus my research work on methodology development and enabling others. I can potentially have a bigger impact by building systems and capabilities that help others do their research than I can by doing that research myself…”
I figure when it comes to the science there are people who can do it better than I can. Where I seem to be able to make the biggest difference is in helping others to do their science. So at STFC we are working towards developing new approaches to look at the structure of “difficult” proteins – that we hope will be useful to others. On a more prosaic level we just build up our own expertise to try and make sure that when users come in we ensure that their experiment works properly and that they can get their data analysed and published (that’s the aim anyway).
It was this kind of thinking that I’m pretty sure primed me to think about the way in which making small changes to the way a lot of researchers work and/or big changes to the way a small number work could have a much bigger impact. I see the work on ONS in a number of ways. One is as publicity; a small number of people may see what we do and think “actually I want to do that as well.” Another is as a push on policy – by showing that this can be done and taking an extreme position we shift the centre of gravity of the community more towards openness. I’m not sure that funders would be moving as much (in a small way) towards more open approaches if the radicals of the OA and OR movement hadn’t been out there on the edges.
Finally, by doing this we show examples of little things that people can do. Maybe they don’t want to put everything online in a public way straight away but giving a working example of an online notebook means that people see it can be done – see the advantages and disadvantages and might choose to do that for themselves. Similarly by talking about using collaborative literature filtering and online services like Citeulike, Mendeley, and Zotero, it gives people a push to using those services which makes them better for everyone. This makes it important that all of these pieces of process, of practice, are useful in their own right and that they don’t have to be combined together to work.
You have been working a lot on Google Wave, for instance. Could you discuss that? How does it play into your statement here, for example, “…we need to build tools that make it easy to take those unstructured or semi-structure records and mold them into a specific structured narrative as part of a reporting process that the researcher has to do anyway. Writing a report, writing a paper. These things need to be done anyway and if we could build tools so that the easiest way to write the report or paper is to bring elements of the original record together and push those onto the web in agreed formats through easy to use filters and aggregators then we will have taken an enormous leap forward.”
Wave is suffering through a very accelerated hype cycle at the moment. What got me excited about it in the first place was really two specific things; the ability to automate the collection and capture of information in an environment which can feel to the user similar to a simple text entry; and the use of history as a way of dealing with the problems of provenance, who did what and when. We haven’t seen much technical development of the latter yet, but there is a lot of promise there for the future. It has been the capture of information that I’ve been mainly working on.
I am a great believer in Tim Berners-Lee’s vision of a linked open data web. The fundamental issue is that we have a chicken and egg problem. People aren’t developing great tools to use LOD because there isn’t very much of it out there. And people aren’t putting it out there because there aren’t great tools. My hope with Wave is that we can start down the road of structuring data by having the user, the author, collaborate directly with computational systems that help them structure their data and description as they are collecting the data, as they are writing the report, and as they are writing the paper.
The idea is that you encourage them to put in a little effort, tagging or marking up their record, because they get a big return for it; ease of marking up next time, a complete searchable index of their work for free. This doesn’t have to be built in Wave; you can imagine doing something in Word, Excel, Open Office, Google Docs, Wikis, whatever. The difference for me is that Wave for the first time made it feasible for me to directly start building things that could be useful for other users. And that was what really excited me.
One of the things that struck me as I viewed Jean-Claude Bradley’s fascinating slideshow, “Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science” is that the process delineated in it seemed to both facilitate the process of scientific discovery and perhaps at times to impede it given the many tools (wikis, Google Spreadsheets, YouTube, Second Life, etc.) that scientists now have to keep track of. Could you discuss some of your favorite tools and what tools are under development that could leverage the particular advantages of each? Is there some mega-platform under development that would address certain problems and what are those problems? Is Google Wave the answer? Is an answer really possible given that some tools are Open Source and some commercial?
It’s a real problem in two ways, first that with lots of different packages you inevitably hit cases where you can’t get information out of one and easily into another. This is why data portability and standards for import and export are so important. It’s a bit dull but in some ways data portability is more important than getting the data out – but we need critical mass of data to get the tools built and around we go in circles. I personally have no problem with a mixture of commercial and Open Source tools, I use what is most helpful to me at the time. But data portability is crucial – What Open Source gives me is some sort of guarantee that I can get my data back out again.
The second big problem as you say is just keeping track of all of these services. Services like iGoogle, Friendfeed, and indeed Facebook in some ways try to create a sense of a dashboard that brings all of that together for people but there aren’t in my view good examples that really work in science. Web service designers are still largely too obsessed with keeping people on their own site. This is a real practical problem that I face in my own group. I’ve tried to use multiple services (our blog, Google reader, Twitter, OpenWetWare and DabbleDB), but people seem unable to keep track of more than one service unless they are really personally invested in the process. And that is rare.
My hope is that Wave or something like it, some sort of aggregated inbox that can talk to all of these systems, or rather many of these that work in different ways to serve different people’s needs, will arrive and help us solve the problem. Sometimes it feels like we are at the stage of rubbing sticks together and hoping something will happen though.
You met a few months ago with Elsevier. Now, to many who are passionate about Open Access and of the school of Information Wants to be Free, Elsevier is the bloated, multinational, science-impeding devil incarnate. And yet here it was talking to you, Mr. Open Science. That was intriguing and a credit to Elsevier. Could you please tell us how and why they approached you and what your interactions were?
I was invited to what they called an “Ideation Workshop.. Essentially they got a group of internal people together from different groups, sales, IT, journals, many of whom had never met to talk about the long term strategy for their web services. I won’t talk about the details of what was discussed because I didn’t ask for permission to do so but in general terms they were engaged, interested, and looking at how to contribute to the long term future of both scholarly communication and its management, and the commercial health of the company.
At core I am a brutal pragmatist. I do what I do, and argue the cases I do because I want to make a difference to making scholarly communication better. My belief in Open Access Publication and Open Research more generally is driven by a belief that this is a practical approach to making things better. Elsevier remain one of the biggest publishers and engaging with them, particularly where there is a discussion that can be sparked internally, seems potentially productive. They’re not going to shift to OA overnight but the opportunity to have some influence on discussions at senior management level, and perhaps more importantly amongst people who are the next generation of senior managers seemed to good to pass up.
What I said to them was essentially that they needed to ask themselves who their customers really were and where the opportunities to have viable long terms business models are. I started with the assumption that there will be a move towards more Open Access publication, driven by funders, and that there were simultaneously big commercial opportunities to be found in providing new types of services to research institutions, funders, and government. And that these opportunities depend on them getting Open Access to the literature…
Did you, for example, discuss with them their Article of the Future project and do you agree with John Wilbanks that there is very little that is transformative at all in the project?
Actually we didn’t talk about the “Article of the Future” – I’m not absolutely sure whether it was even out at that point. However, I do share John’s assessment. I think I was a bit more brutal at the time. Essentially all I saw was the article of the past…but on a web page. There were some nice presentational aspects and I liked the way “supplementary” information was promoted to having a place in the paper but my over-riding response was that what they’d done should have been trivial if the information was organized properly in the first place. If it wasn’t trivial there were more important problems to solve than presentational ones.
And speaking of the article of the future, do you think that perhaps the scholarly article in a way has no future or that it will become in only a few years time a perfunctory afterthought for a young scientist instead of the crucial career-making measure of accomplishment that it still is?
I think the role of the journal article in making a claim, or expressing an idea in human-readable form will remain important. But as the literature grows we need more effective ways to manage it, and we have to ask ourselves as a community whether we can afford it. More often than not when I read a paper I am looking for a single number, a link to a data file, or a single statement. My personal view is that we could lose maybe 90% of all published papers and simply put the data up with some methodological description on line. This would save billions, money that could then be spent on research.
The problem with this thinking is that these different costs are rarely tensioned together. No researcher ever has to make the choice of “do I spend $50k on a Nature paper or do I employ the postdoc for another year?” If people were having to make that choice on a regular basis I think pre-print archives would flourish.
After all, in the world that you and Jean-Claude Bradley seems to be actually creating, publication as we now know it is almost irrelevant given that much that is of interest in the sciences is now becoming available in many venues such as preprint sites like Nature Precedings.
For me the question is really about we do this in the most cost-effective manner while making sure the outputs of research are as available as possible. This means we need different types of “publication” for different types of output and different types of peer review at different stages. Again, until we are tensioning all the costs of research against each other I don’t really see this happening, and we will continue to pour taxpayers money down the drain.
Could the rise of Open Science actually benefit the mainstream publishers in that as processes and communications become ever more efficient and scientific networks expand, thereby pooling resources, the publishers may reap the benefit of ever greater amounts of science to publish? Or will they have to contend with the fact that the coming generation of researchers will become so used to such seamless science-making that they will scoff at the idea that they need to publish in a journal that only the richest of research libraries can afford? How do you see Open Science affecting the career paths, say, of young biochemists or neuroscientists? Are there some fields that are simply not suited to the world of Open Science such as many of those in the health sciences?
There are massive commercial opportunities in providing repositories and platforms to enable the publication and archival of research in its widest sense. Someone is going to make a killing on this by having the right platform to deliver what researchers and their funders need, when they realize they need it, and conventional publishers have a lot of expertise in building the tools and frameworks, and managing the infrastructure that can support these kinds of services.
The younger scientists I see are still driven by the need for obtaining conventional markers of success. And they are absolutely right that they need them. It is more people at my age, who have just got tenure or are in reasonably safe positions who can ask the question as to whether publication is worthwhile. Most PIs ask this on a regular basis. Most have more data than is ever published. I have maybe four papers worth of work that is not online and isn’t published, and it isn’t worth my while to put in the work to do that because I have more interesting things to get out. Again, another opportunity to provide services that make it easy (for a reasonable price) to get those not quite properly organized sets of data out into the open effectively.
I don’t think Open Science as a movement coming from inside the research community will change career paths. The community doesn’t have the will or interest to change itself. Change will come from outside and will be imposed by funders responding to government pressure, their own motives in the case of charities and commercial funders, and in response to public outrage. “ClimateGate” may have been utter nonsense, but it gave the public their clearest view to date of the attitudes of scientists to data and publication. And they didn’t like what they saw.
I think there may be communities that are not suited to Open Science. I think we need to ask difficult questions about privacy and the rights of test subjects. But issues of privacy are much bigger than just those around Science; perhaps they are the major social issue we need to face up to in the Western World over the next decade. So there are special cases where data cannot be made open and probably should not be made open, but they are special cases, and not the default.
Could you please compare and contrast the attitudes and successes of Elsevier, Springer and Nature Publishing Group vis-à-vis adapting to the pressure from researchers and even now the general public for greater employment of the principles of Open Access?
At a corporate level I don’t think any of these organizations are adapting or making opportunities for themselves. I think they are responding and trying to manage those changing expectations without undergoing radical change that would be required to respond positively. That said, within all of these organizations there are people who understand the promise and opportunity of OA, but large organizations change slowly. BMC seems to be doing well inside of Springer, but it remains to be shown that this is more than a side experiment for the parent company. NPG are probably the most productively engaged with the whole agenda of improving scholarly communication but even there, they will only accede to non-commercial licences for their so-called OA offering. A case of offering just enough to keep the crowd happy rather than taking the opportunity of stealing a march on the competition.
You say, “The journal used to play an important role in publication. The publisher still has an important role but we need to step outside the notion of the journal and present different types of content and objects in the best way for that set of objects.” Will it become the case that scientists could simply start purchasing what we might call nano-content? To wit, just the conclusion of a full article or a single graph or chart from one? Do you see the publishers going for such a model on the premise that some revenue out of an article is better than none or will they balk at the idea of making peanuts on piecemeal distribution of bits and pieces of what they can, as things stand now, potentially sell for a hefty sum as a whole, intact object?
There is an alternative to the OA model, and that is the micro-payments model in which the researcher has their own budget and pays small amounts for what they need. An iTunes for science if you like. But by analogy with the music industry if this was the way forward then you can imagine peer to peer sharing of purchased content being very popular. And DRM for science is bad news. It’s bad news anyway but the whole point of getting research content is so you can build on it. Making that difficult would defeat the purpose of the exercise.
Now I am going to put you on the spot and ask you to help readers gain a grasp of who is who in Open Science and what the particular strengths of and accomplishments of each are.
Let’s start with John Wilbanks. Could you please delineate for us the relationship of Science Commons to Open Science? Wilbanks seems to an interesting bridging figure between the world of Open Science, Open Access, the whole world of the idea of an information commons, big data and Web 2.0. Do you interact with him much and where do your interests and views intersect and diverge?
John is the figure who perhaps more than anyone else has driven forward the discussion and ideas around the high level of issues of policy and practice as well as actively pushing forward with specific projects. The success of Science Commons is getting exemplar projects moving forward, such as the SAGE project, is largely due to John’s boundless energy and enthusiasm. Mostly we intersect when we happen to be at the same meeting. I’m in strong agreement with John about most things. Perhaps we were diverge somewhat is that he is a stronger proponent of “core” semantic web approaches whereas I would characterize myself more as feeling that we will be able to work with a more flexible range of technologies. But it’s a very minor difference.
Michael Nielsen. Is he more of a theorist and explicator and less of a hands-on scientist than you and Jean-Claude Bradley?
Michael definitely comes from a theoretical background which brings another perspective. It isn’t clear to me for instance that the idea of an Open Notebook evens makes sense for a theoretician. I don’t really have a clear enough idea of how they work on a day to day basis. At the same time Michael is far and away the most conventionally successful scientist who is an active proponent of Open Notebook style approaches. The fact that he has largely moved away from active research to focus on trying to change the practice of science sends a very strong message. I sometimes wish I had the courage to follow that example.
Jean-Claude Bradley. How do you two resemble one another? I would say that you blog more extensively in the scientific essay mode a la Michael Nielsen and that Bradley tends to be a bit more utilitarian in his approach. Am I close here?
Jean-Claude is the real originator and driving force behind the Open Notebook movement. I guess we are similar in our approximate career stages and we have some cross over in terms of research interests. Jean-Claude takes a much more direct and immediate approach to his work, using tools and services that are available and taking a very strong approach to requiring that those are freely available. He also seem to have a lot more energy than I do!
I think if you were to characterize the difference between us it would be the Jean-Claude is much more focused on outcomes and immediate returns whereas I am more interested in process. This means that he gets on and uses what is available to make things happen. I tend to be more interested in (and get more frustrated about) building tools for further down the line. We have some slight philosophical differences in that I think I am more gung-ho about favouring policy changes that push researchers harder in the open direction but these are shaded differences of emphasis rather than big differences.
Bora Zivkovic
Bora is the uber-science blogger. He is one of the longest serving science bloggers with the biggest following. He has been a long term ideological supporter of Open Access and open movements. I think he has a much more political and policy orientation than most of the rest of us do.
Rich Apodaca
Rich I know mostly through his writing on Depth-First and the development of various tools, most recently ChemPedia and its associated Stack Overflow-based site ChemPedia Labs. Mostly my interactions with Rich involve one of his tightly written blog posts that crystallize an idea, often around practice or scholarly publication. Two posts that stand out for me was one on the Seven Deadly Sins of scientific publication (practice?) and things he wrote about micropublication.
Bill Hooker
When I first started reading stuff online Bill was one of the very early people I came across (along with Jean-Claude, Deepak, and Neil Saunders). His three part series on 3 Quarks Daily was a very strong early influence on my thinking. He is one of the most active and vociferous advocates of open practice around. I once jokingly characterized him as “Bradley’s Bulldog.. Probably more than anyone else Bill has taken the fight to the online trenches, arguing the case in forums and blogs.
Antony Williams of ChemSpider
Antony again is one of these people with apparently boundless energy who just keeps getting on and doing things. He spent years pushing his personal vision of an online resource for chemists forward with little or no funding spending vast amounts of time on it. Now that Chemspider has been purchased by the Royal Society of Chemistry and he has more resource and stability you might think he would take a rest and sit back a bit but he seems to be pushing forward and in the air travelling even more than he was before.
Antony is a do-er. Again a bit like Jean-Claude he will take what is available and make things happen. He is less interested in the technical and sometimes ideological issues that rage around licences, databases, and information structures and more focused on building an end product that people can use. This can and does lead him into conflict sometimes with people that are more worried about other aspects of the problem, but we need that diversity of approaches and services if we want jam today as well as tomorrow. Bottom line, Tony makes stuff happen.
Deepak Singh
Deepak again was one of the people online that I came across very early and had a strong influence on my thinking, particularly on tools, technical issues, and systems design. Also simply in the way he presents himself online. He has one of the most effective and coherent online personas I’ve come across. His experience and approach is more commercially oriented than mine so pay very close attention when he writes and speaks about issues around intellectual property and commercial practice.
We probably differ in our stance around patent law and intellectual property, not because we fundamentally disagree about any principles, we both want to see the most effective conversion of research into valuable innovation, but because of our background and experiences.
Andrew Lang
Andy I first came across as Hiro Sheridan in Second Life and it took me a while to make the connection. He has been really instrumental in creating a lot of the imagery that has supported a range of projects. He is a dab hand at pulling together rapid and lightweight visualizations of data using a wide range of tools. Again there is a focus on making stuff happen. His background in maths and computing has made it possible to turn a lot of things around very fast. Particularly in combination with Rajarshi Guha who did a lot of the development of chemoinformatics services that support every aspect of the Open Notebook Solubility Project he is one of the team that have turned perhaps rather dry lists of numbers into compelling visualizations.
and the up and comer Steve Koch
Koch has been a real breath of fresh air and has brought new energy into the whole community. As a new tenure track academic he is really putting everything on the line and pushing the envelope on Open Notebook Approaches. I have thought for a while that ONS approaches would be most risky for new academics and that therefore they would be unlikely to take this approach. I am delighted that he is proving me wrong on a daily basis.
I should also mention that he has a young and extremely energetic research group that are really backing him up and are also gung-ho on Open Science. Anthony Salvagno, Andy Maloney, Larry Herskowitz, Brian Josey and the others are all energetic, positive, and getting out there to make more of their science available.
and Shirley Wu who, given her job at 23andMe , is a fascinating example of how the coming cohort of scientists and technologists interested in Open Science are becoming figures of note in related areas such as Health 2.0 and in the commercial sector.
Shirley first got in touch with me via my blog when she was a PhD student in Russ Altmann’s lab at Stanford, with the suggestion of running an Open Science session at the Pacific Symposium on Biocomputing. She did most of the running on this, writing up the report, being far more successful than me at getting sponsorship and generally pulling everything together to make it happen.
She has been both a great thinker and articulator of ideas on her blog and although her writing has been more focused recently on writing on the Spittoon at 23andMe on specific genetic issues I think she has a great future writing more generally about science and how it is carried out. I certainly hope that she continues to write about her ideas and opinions in that area.
Have I missed any major figures?
Other people I should mention as both major players and influences are Peter Murray-Rust, who is one of the most sustained and energetic fighters for Open Access and Open Research around. He has been a bit less active in the blogosphere recently because of the large number of big projects he is involved in. In his group at Cambridge both Jim Downing and Nico Adams have been pushing on the tools front for a long time.
Duncan Hull, who is now at the EBI in Cambridge, is another person who through his writing and personal example exemplifies the effective use of technology in supporting a research and development career. Duncan consistently punches well above his weight, both because he is very smart, but also because he applies that to the effective use of web technologies.
In terms of influence there are far too many people to mention. I often think I have no ideas of my own, I merely synthesize the ideas of others. So all the past and current people on Friendfeed, Twitter, and online more generally make a big difference, sometimes in small pieces and sometimes in larger ones.
Could you comment on the role that search might play in Open Science? For instance, does Wolfram|Alpha hold any particular interest for you or do you think it was much overhyped?
Search is crucial for the success of Open Science. As we put more stuff online it has to be possible to find it. This means developments in semantic search, as well as improved computational engines like Wolfram Alpha. WA is an impressive piece of technology but from my perspective it has a fatal flaw. It only works on the curated information that is in their database. This has two problems – a lot of the chemistry in their is simply wrong or badly misleading. Various of us have reported problems, but I’ve seen little progress on this. But fundamentally if we don’t know what the root data is or its source then it is of little use to the wider research community where provenance is everything.
The second problem is related but slightly different. Because it is a curated database there is absolutely no way it can keep up with what we are generating. Now if WA could work over my data then I would be interested in formatting it in the correct way. But since it can’t there is little point. Until WA can be connected to a data aggregator that spiders the web I think it will be a useful tool for a few things but mainly a toy. If it can use the whole web as a database then things become very very interesting.
Where are we on the purchase of FriendFeed by FaceBook? Many of the members of FriendFeed (including yourself) seemed concerned that much of the accumulated knowledge in the science-focused rooms therein would be lost to the community. What do you see as the future of FriendFeed under FaceBook?
I can’t see that FriendFeed has a long term future in its current form as essentially the ex-project of the FaceBook development team. We are already seeing things take a long time to fix or problems (like the broken search) just not being fixed. I think we’re on the long slide down to oblivion. The question is, how long?
My hope is that the team might open source Friendfeed which would give us something to build out from as a community. How we would finance it is an open question but there are some interesting ideas floating around. It is also possible that we might be able to build something “next generation” with distributed commenting and streams. I now have a lifestream on my new website that is quite Friendfeed like (although it doesn’t allow comments) and Bosco Ho has recently mashed up PubMed with Disqus to make a distributed journal article commenting system.
There are a lot of good ideas out there. I’m not worried about whether we could build something if we needed to. What I am worried about is how we could fund that work.
Speaking of FaceBook, do you have any comment on the supposed showdown over the future of the Internet vis-à-vis the more gated community orientation of FaceBook versus the freer orientation of Google? Bets on who will win or is such a duel genuinely in the offing? Implications for Open Science?
Apparently there is a whole generation who think FaceBook is the internet. I think in the end open wins out, the bigger system, with greater diversity and greater ability to create new systems wins. That’s true when there is real diversity and competition. A good question is whether it remains true when one system captures the majority of the market. Internet Explorer is losing market share even though it had enormous penetration. But equally Chrome is struggling somewhat even with the might of Google behind it. And if Google aren’t supporting Mozilla then can Firefox survive?
I want to believe that open always wins in the end. Sometimes I fear that my conclusions are being driven by my wishes rather than the evidence, though.
You use Twitter, but you don’t seem to have written much about it. Do you see it as having much of a role to play in Open Science? Jean-Claude Bradley does not seem to have mentioned it as one of the many tools employed by the scientists engaged in the project he discusses in the slideshow mentioned above, “Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science.” Whom do you regard as must-follows on Twitter? What Twitter tools do you use?
I use Twitter largely as a way of tracking a somewhat different community of people to that I find on Friendfeed. I don’t get much “real science” out of Twitter. It is more about staying in contact with the IT and UK information management communities. I think in most cases the tool is close to irrelevant; it’s the community that matters. To the extent that the tool matters it is because of the way it supports the formation and management of the communities relevant to you, not the functionality itself.
In terms of information gathering the person I get the most from on Twitter is Glyn Moody, but to be honest I could cope if it closed tomorrow. I use Twhirl to follow on my laptop and Seesmic on my phone.
On a related note, what do you see as the future of RSS?
RSS, or rather feeds in general are key to the future of information management in my view. ATOM is more powerful and flexible but doesn’t need to worry the user too much, ATOMPub is a great protocol for pushing information as well. Feeds can go in both directions. Going forward we will also see more real time “feeds” using push protocols like XMPP. But at the bottom my entire world view is about manipulating feeds of information and objects: controlling which ones come to me and deciding how to deal with them and pushing out other feeds of content I’ve touched, used, or created.
Where are we with the matter of Open Science logos and how crucial do you think that matter is?
Clarity is really important when you are saying what you intend to do and how to do it and in that sense I think logos are really useful. They also, by the use of trademarks, let you define standards and aggregate communities around statements of purpose or ideals. The great success of Creative Commons is that they’ve created both a community and a clear statement that is wrapped up in a couple of logos that (to some extent at least) have clear understood intentions. If we can achieve this for Open Notebook Science then I think that would be a great thing.
Which brings us to the question of why I haven’t used them I guess…Simple answer is that I don’t have access to the style sheets that drive my lab notebook. This is a surprisingly general problem – and also applies to licences. I want to put up some data, so I do it on some service, but that service doesn’t let me put a ccZero licence (e.g. FeedBurner doesn’t have the ability to apply ccZero to your feed although you can use CC-BY). If I don’t have the ability to add it how can I express my wishes? However, that excuse is running a little thin because I’ve learnt enough about stylesheets that I could probably do it now…
How do Tim Berners-Lee’s ideas about the Semantic Web fit into Open Science? Do you take issue with any of his pronouncements and positions?
I absolutely believe in the Semantic Web and Linked Open Data as the way to move forward for describing the outputs of research in the lab. I guess I have two slight differences with TBL. Firstly I am not sure that an exclusive focus on RDF is the best way forward because there is some mileage in other formats that are widely used. They are not as good in the longer term, but they might make a good stepping stone for many cases and we need to solve the problem of transferring lots of legacy data as well.
Where I have a different emphasis to many SemWeb people is that I think our biggest problem is providing tools that capture information and build the links. Generating linked data seems to largely involve a lot of typing of angle brackets or mastering complex software libraries. This is nonsense if we’re going to produce large quantities of linked data. We need tools that help the average user, the average scientist generate and publish data in this form without them having to worry about the details. If it’s further away than a right click then it isn’t going to happen.
If you were to win a MacArthur fellowship, how would you spend the money?
I have to say I’m not actually sure exactly what you get for a MacArthur fellowship but I was thinking about putting in for an ERC Senior Fellowship so let me describe what I was thinking about for that.
Firstly, build better tools and example usecases based around my scientific interests. This means build out new versions of the lab notebook systems, try and create a real Open Source community around them but solve the immediate problems. Really deploy all of those little tweaks that will make the system a joy to use. Build analysis software that really integrates into the linked data web creating a provenance trail that gets carried with the data, saving each stage of the analysis along the way.
Secondly, get more effectively involved with the community work that is going on. There are lots of initiatives and exciting developments happening at the moment. I’ve been invited to help with some of them, but I have limited time to dedicate to them which means I can’t be involved in all of them and I’m not always feeling I’m giving my best where I am involved. I want to contribute as effectively as I can.
Third, get right into the social sciences and look hard at what is happening around us. Really look hard at the different online tools, figure out what is working and what is not, and try to understand why. Look for historical parallels that will help to predict the best way forward. Get a real understanding of the social sciences that can help me to understand developments and push them in the right direction. Get real evidence of what open practices are achieving, and where they work best. Find the low hanging fruit and apply resources where they are most likely to find the big wins.
What is your goal for the next year? The next five?
This year it is to make sure I get you these answers before the year is up! Seriously this year my goals are the same as last year, to make more space and time and find the resources to do some of the things above somehow. Over five years, really the same. I would like to have made a significant impact, for the better, on the way scientists work over that period.
Who are you heroes in science, technology and in any other field on endeavor?
The people who make a difference. I’m not sure I really do have heroes per se. I admire people who think clearly, and who influence my thinking in positive ways but that’s a little circular. As a scientist I guess I am a little uncomfortable with the idea of placing too much of an aura around the person rather than the work. To the extent that I have heroes many of them are the people mentioned above. The ones who have motivated me, through their ideas or their example, to try and make a difference where I can.
You and I both recently attended ScienceOnline2010. Do you consider that a must-attend event for those interested in Open Science? Can you tell us a bit about how you think your own sessions went, which presentations you found most compelling, whom you got a chance to chat with there that you already knew and what new people you met there that interested and impressed you?
I was very happy with the Open Notebook session that I was involved with and less happy with the Wave demo. In retrospect I probably should have shown the canned video demo that I had rather than try and do it live, which is a lesson I will take to other demos. I was really disappointed that I missed sessions by Andy Farke, Pavel Szczesny, Jon Eisen and many others. There was just far too much good stuff going on. I think Science Online is growing into a must-attend event for those interested in how scientific communication is evolving in its widest sense.
For me, I guess I came away with a couple of key ideas about how to distribute data in new ways and a greater appreciation of the challenges of curation in the longer term. But also the potential for connecting things together – a lot of this seems nearly in reach if we could just figure out how to bridge a few more gaps.
Also, some of the librarians at ScienceOnline2010 came away with the impression that some scientists regard libraries as we know them to be increasingly irrelevant. Do you have any suggestions as to how librarians can play a role in Open Science?
The role of “the library” in the 21st century is a source of angst for many people. The broader question really is how do we want to manage information in academia in the future, and what spaces, real or virtual, need to be provided for people to interact with information. A lot of this is virgin territory, so there are massive opportunities for the people and communities who want to get out there and make things happen. Equally, a lot of it is unfamiliar territory for librarians and other information managers. Life is like that.
More specifically in the area of Open Science I think librarians are well placed to help deliver publication mechanisms through repository-like systems, continuing the role of providing access to material, but inverting the usual relationship with local and offsite academics. They will now be providing the work of their own institution to others. In a sense this is a much more logical way of doing it. Linked to this is the role of guardian and preserver of the institution’s outputs – a role that in many cases no-one is taking on at the moment.
The other important role is to take a much stronger advocacy role in explaining the true costs of current subscription systems. A central problem with our current structure is that academics are blinded to the costs of their publishing decisions. I think librarians need to be radicalized to take on their own academic staff and make them see and feel the costs of their decisions. I can see that this is both dangerous and scary and also a departure from the traditionally subservient role of library decision making to academic need. But if the current fear is a lack of relevance, then maybe some pain is required to make it clear what the relevance is?
What conferences do you recommend those interested in Open Science attend and where do you plan to appear in coming months? I am looking forward to hearing you speak at the Science Commons Symposium – Pacific Northwest February 20, 2010, for instance.
That’s a hard question – Open Science seems to exist on the periphery of mainstream conferences and at the same time to a certain extent in a ghetto of its own. The Science Commons Symposium will be great – and I’m really excited to be there. The Science Online London and Science Online 2011 meetings will be important marking points to look at progress. The SAGE Congress in April in San Francisco will be an exciting place to talk about what is possible today and into the future. But the place to really progress the discussion about Open Science is in the mainstream – by standing up in a conference and giving a talk…and providing a link to the data. By making the underlying code and analysis available and saying so in the paper. By simply raising the bar for what we expect from the communication of good science. Ask the questions – demand the answers and live it out with your own communication to the best of your ability and change Open Science from an activity practiced by fringe lunatics into nothing more than the good practice that we expect.
Thank you for your time, Cameron.