Friday, 26 February 2010

Development and research - the project seesaw

Just a quick blog post to reflect on where I'm up to at the end of week 5 of my Arcadia project.

The project's always seesawed uneasily between been a 'technology' project and being an 'implications of technology' project, between attempting to develop a tool, and thinking about what the implications and needs of this tool might be. This week the seesaw's continued to rock back and forth, but perhaps there's now a more harmonious rhythm to it. Developing a real tool can be a good way to find out what future Library IT projects might need, and to find out the problems and delays that are likely to appear. This week's been a week of delays, in many ways, but I'm starting to see them as productive ones: the delays that I'm facing now (getting student data from MISD, getting contact names from every Department in the University, learning to use the range of different technologies necessary to connect together 3 very separate systems) are probably going to be ones that Library projects face again and again. So if I can document them now, and start working on making the process smoother, then the next Library project will perhaps be better planned, and work more efficiently. After all, Huw Jones in his Arcadia project had to spend days if not weeks wrestling with the widget code, which he was then able to copy straight into my project ready sorted. So things here are going slowly, but perhaps that's not a bad thing.

Duke University Open Access policy

Interesting post by Cathy Davidson, co-chair of Duke Open Access, under the title "Information wants to be sustainable".

At this writing, we propose that the final draft of any article written by a Duke faculty member--not the printed article but the draft before it goes to press--be made available in pdf form and archived in a repository at Duke where it will be available to search engines and therefore to any searcher, yes, for free. This means that the fruits of our collective research can be made available to the world, even if the actual citational final paginaged publication copyright will still reside with the publisher. With modifications offered by Duke faculty in the various forums and committees to which we've now presented this policy, our Open Access policy is roughly similar to the ones already accepted at Harvard, Stanford, MIT, the University of Kansas, and a number of other public and private institutions. It is both modest in its scope and important. It allows faculty to use their own work in their classes; a member of our Taskforce in the Medical School reported that she was having to pay $500 for permission to use her own published article in her teaching. Recently, I took my class to visit the lab of a colleague and assigned several of his articles, listed on his website. We clicked. No article. The publisher had made him take the links down. My students were able to go through Duke University library to read these scientific papers but, if they had not been institutional members, they would have had to pay-per-view for each article, all of them written on grants that our tax payer dollars had supported. An Open Access repository at Duke also means that the work of faculty can be included in online searches by topic and that readers can find the work easily and read it in the pdf form even if they do not have an individual or institutional subscription to the journal. They will still need to go to the actual journal (the printed final copy of the article) for proper citations, but presumably there are many people who will want to read an essay even if they aren't planning on citing it in their own work.

Studies of citations also show that papers previously published in this preprint open access form are more likely to be cited, by a significant margin, than essays that are not available in this form, even when citation requires taking that extra step of going to the actual published journal to cite the paginated essay. In addition, the policy guarantees the future archiving of the article. So there are many benefits to the faculty member. However, if a faculty member has any hesitation at all about this method of open access archiving for any reason, there is also a "no explanation necessary" escape clause available to any faculty member who wants out. If you don't want your article archived in an open access repository, you don't have to have it archived. No questions ask. If you do, however, you can be assured that Duke will preserve it even if your journal collapses or sells its archives to some expensive commercial vendor.

So there seems to be considerable benefit to the scholar and to readers.

Thursday, 25 February 2010

Date for diaries: the second Arcadia Lecture...

... will be given by Dan Cohen of George Mason University (and the Zotero project) on April 30 at 6pm, in the Riley Auditorium, Clare College.

His topic is "The Social Life of Digital Libraries".


The digitization of libraries had a clear initial goal: to permit anyone to read the contents of collections anywhere and anytime. But universal access is only the beginning of what may happen to libraries and researchers in the digital age. Because machines as well as humans have access to the same online collections, a complex web of interactions is emerging. Digital libraries are now engaging in online relationships with other libraries, with scholars, and with software, often without the knowledge of those who maintain the libraries, and in unexpected ways. These digital relationships open new avenues for discovery, analysis, and collaboration.

Daniel Cohen is an Associate Professor in the Department of History and Art History and the Director of the Center for History and New Media at George Mason University. He is coauthor of ‘Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web’ (University of Pennsylvania Press, 2005), author of ‘Equations from God: Pure Mathematics and Victorian Faith’ (Johns Hopkins University Press, 2007), and has published articles and book chapters on the history of mathematics and religion, the teaching of history, and the future of history in a digital age in journals such as the Journal of American History, the Chronicle of Higher Education, and Rethinking History. He is an inaugural recipient of the American Council of Learned Societies’ Digital Innovation Fellowship. At the Center for History and New Media he has directed projects ranging from digital collections (September 11 Digital Archive) to scholarly software (the Zotero extension for the Firefox browser that enables users to manage bibliographic data while doing online research).

All welcome.

Cam.Talks page for this lecture.

The Data Deluge

This week's Economist has a special survey (written by Kenneth Cukier) on the explosion in digital data that's already problematic for anyone working in the physical and biological sciences. The Leader that precedes it says, in part:

EIGHTEEN months ago, Li & Fung, a firm that manages supply chains for retailers, saw 100 gigabytes of information flow through its network each day. Now the amount has increased tenfold. During 2009, American drone aircraft flying over Iraq and Afghanistan sent back around 24 years’ worth of video footage. New models being deployed this year will produce ten times as many data streams as their predecessors, and those in 2011 will produce 30 times as many.

Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still. Even so, the data deluge is already starting to transform business, government, science and everyday life (see our special report in this issue). It has great potential for good—as long as consumers, companies and governments make the right choices about when to restrict the flow of data, and when to encourage it.

Laptop rage

Some academics really hate it when students spend the entire lecture updating their Facebook profiles.

Monday, 22 February 2010

Panton Principles on Open Data

The principles are:

  • Where data or collections of data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual data elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license. When publishing data make an explicit and robust statement of your wishes.

  • Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged. Use a recognized waiver or license that is appropriate for data.

  • The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.

  • Furthermore, in science it is STRONGLY recommended that data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of much scientific research and the general ethos of sharing and re-use within the scientific community.

    From Panton Principles.
  • Tuesday, 9 February 2010

    Whither Legal Deposit for Online Publications?

    One of the mini-projects I started out on, and have to still to write up (and, indeed, complete!) during my Arcadia Fellowship was the life of a book, describing the travels made and processes applied to a work from the moment it is received via the legal deposit process into the University Library, through cataloguing and shelving, to the point at which it makes it back on to the shelf after its first loan.

    One thing that didn't really cross my mind at all was the way in which born digital and published online content might be subjected to the legal deposit process, nor how legal deposit libraries might secure the long term availability and preservation of those works.

    Anyway, it seems as if the Department for Culture, Media and Sport (DCMS) have been consulting on the subject, and now they've opened up the consultation to a potentially wider audience than might have originally been the case by republishing it, in commentable form, on the WriteToReply consultation platform: Proposal on the Collection and Preservation of UK Offline and Microform Publications and UK Online Publications

    The consultation seeks opinions on several proposals relating to the legal deposit of Offline and Microform Publications as well as Online Publications.

    Which is to say, DCMS have worked out what they want to do (the proposals) and now's your opportunity to comment on it. As well as formal institutional responses, I got the feeling from a meeting with DCMS a week or two ago that they were interested in seeing how things like WriteToReply might help encourage a wider range of contributions to might complement full institutional responses. It's also worth bearing in mind from the individual user comment feeds, it's possible to use WriteToReply to help draft a full response...

    As well as the proposals mentioned above, the consultation document is soliciting feedback on several other related matters.

    For example, an impact assessment for agencies likely to support the process is provided in Impact Assessments – Intervention and options, analysis and evidence, which reviews some of the legal constraints around harvesting, as well as the costs of maintaining a legal deposit service for online materials. Have they identified all the major risks, or are there practicalities that have escaped them?

    Defining territoriality is a major consideration on which feedback is requested in Further Details on Territoriality. With intellectual property rights in such a mess, here's an opportunity to contribute your opinions to the process.

    Finally, practical everyday considerations about the actual legal deposit process are raised in Further Details on Harvesting Process. At the end of the day, the techies are going to have to implement this stuff. Here's an opportunity for developers to raise any concerns in an informal way

    [Disclaimer: I am co-founder of the WriteToReply platform]

    Monday, 8 February 2010

    Bloggers: queue here for bus passes

    Interesting insight from the Pew Project into the way the media ecosystem is evolving.

    Since 2006, blogging has dropped among teens and young adults while simultaneously rising among older adults. As the tools and technology embedded in social networking sites change, and use of the sites continues to grow, youth may be exchanging ‘macro-blogging’ for microblogging with status updates.

    Blogging has declined in popularity among both teens and young adults since 2006. Blog commenting has also dropped among teens.

    * 14% of online teens now say they blog, down from 28% of teen internet users in 2006.

    * This decline is also reflected in the lower incidence of teen commenting on blogs within social networking websites; 52% of teen social network users report commenting on friends’ blogs, down from the 76% who did so in 2006.

    * By comparison, the prevalence of blogging within the overall adult internet population has remained steady in recent years. Pew Internet surveys since 2005 have consistently found that roughly one in ten online adults maintain a personal online journal or blog.

    While blogging among adults as a whole has remained steady, the prevalence of blogging within specific age groups has changed dramatically in recent years. Specifically, a sharp decline in blogging by young adults has been tempered by a corresponding increase in blogging among older adults.

    Clueless on copyright

    We complain that young students are casual or clueless about copyright. But it looks as though the Obama White House may be just as confused.

    New uses for Google #56632

    Thanks to Dave Briggs for spotting it.

    Friday, 5 February 2010

    Cambridge Libraries Widget Launched

    Today sees the official launch of the Cambridge Libraries Widget:

    Developed in partnership between the UL and CARET, the widget is a unified interface for Cambridge library users, drawing together search facilities, library profiles, loans and requests in one easy-to-manage application. It's live in CamTools (Cambridge's VLE), Facebook and iGoogle.

    The Widget is very much a product of the Arcadia Programme. It draws on the ideas of centering services around the user and providing relevance in resources and functionality which have proved such strong themes. At it's heart is a cross-institutional collaboration - a very Arcadia idea!

    You'll need to be registered with a Cambridge Library to try it out at the moment ... but we're planning to write up what we did and how we did it at some stage so others can benefit too.

    Thursday, 4 February 2010

    Two weeks into the project... software, pedagogy and research

    So, two weeks into my Arcadia research project, how are things going?

    Building on Huw's work for the library widgets, I've begun creating the prototype exam paper widget. As so often with technology, the bits you expect will be hard are easy, and vice versa! Raven integration turned out to be instantly sorted by a combination of folder settings and ready-made code from the library widget, but getting to grips with JQuery is taking longer than I'd hoped.

    Meanwhile, I've been researching the current workflow for the paper archiving of past exam papers - you can see a first draft here. The coloured boxes indicate potential moments at which digital archiving could take place. I've also been researching the range of forms and media in which exam papers exist - from audio files to data sheets. I've visited a number of libraries and librarians across the University, who have all been extremely helpful.

    I've been conducting a small scale literature review, looking for any research into best practice in using past papers to support exam preparation. Unfortunately, this does not seem to have been well investigated in the past: however, Cambridge itself has carried out a number of small scale projects around this issue, and I'll be talking to the instigators of these in the future.

    Over the next week, I'm planning to continue the process of gathering together current digitised exam papers, and labelling them according to appropriate University schemas. I'll begin research directly with students, finding out more about their needs when revising. It would be interesting to hold a small focus group with supervisors as well, in order to chat about their use of past papers. And of course, I'll continue developing the prototype software, aiming to get it up to the point where the information is working in a web page, though without a determined user interface.