The Arcadia Project Blog: 2010

Friday, 17 December 2010

End of Delicious?

Many reports are circulating on the Interweb regarding Yahoo's planned closure of several sites, not least of which is Delicious, the social bookmarking site. From being one of the first web 2.0 successes, the site has had many problems over the past few years and some have noted its failure to adapt and evolve to meet changing expectations.

We used Delicious to list useful web resources on the first ever Arcadia project, science@cambridge. Many libraries in Cambridge and beyond have also done the same, its a great tool. Since then, the potential risk of loosing third party infrastructure like this has often popped up in discussion. Now it may be a reality. (Large portions of our site are also using Pipes. Lets also keep our fingers crossed for that superb service).

Thinking on a wider scale, Delicious, like Wikipedia, StackOverflow and many other online resources full fill some of the functions of a library in the networked world, namely the classification of units of online information. Many people rely on it daily, and much noise has made of its community basis as a real alternative to traditional means of classification.

Now thanks, to a corporate reshuffle, it may just disappear as a result of market conditions. I'm left with on a Friday afternoon with three things to think about:

Why was the site judged a failure? Is tagging a fad that will fade, whilst traditional classification will somehow endure (this I doubt) ? Is it because its function was better provided by other successor sites, or some other reason?
If the market cannot sustain these networked library-like services, should libraries (or the non-profit educational sector) start developing services like Delicious? Would we be better placed to provide this vital web infrastructure over a commercial entity? Would it be a better investment than an Institutional Repository?
Does anyone care now we have Facebook?

Wednesday, 15 December 2010

Myths about students -- and implications for Web design

Interesting post by Jakob Nielsen which claims that usability research undermines some prevailing myths about students.

Myth 1: Students Are Technology Wizards
Students are indeed comfortable with technology: it doesn't intimidate them the way it does some older users. But, except for computer science and other engineering students, it's dangerous to assume that students are technology experts.

College students avoid Web elements that they perceive as "unknown" for fear of wasting time. Students are busy and grant themselves little time on individual websites. They pass over areas that appear too difficult or cumbersome to use. If they don't perceive an immediate payoff for their efforts, they won't click on a link, fix an error, or read detailed instructions.

In particular, students don't like to learn new user interface styles. They prefer websites that employ well-known interaction patterns. If a site doesn't work in the expected manner, most students lose patience and leave rather than try to decode a difficult design.

Myth 2: Students Crave Multimedia and Fancy Design
Students often appreciate multimedia, and certainly visit sites like YouTube. But they don't want to be blasted with motion and audio at all times.

One website started to play music automatically, but our student user immediately turned it off. She said, "The website is very bad. It skips. It plays over itself. I don't want to hear that anymore."

Students often judge sites on how they look. But they usually prefer sites that look clean and simple rather than flashy and busy. One user said that websites should "stick to simplicity in design, but not be old-fashioned. Clear menus, not too many flashy or moving things because it can be quite confusing."

Students don't go for fancy visuals and they definitely gravitate toward one very plain user interface: the search engine. Students are strongly search dominant and turn to search at the smallest provocation in terms of difficult navigation.

Myth 3: Students Are Enraptured by Social Networking
Yes, virtually all students keep one or more tabs permanently opened to social networking services like Facebook.

But that doesn't mean they want everything to be social. Students associate Facebook and similar sites with private discussions, not with corporate marketing. When students want to learn about a company, university, government agency, or non-profit organization they turn to search engines to find that organization's official website. They don't look for the organization's Facebook page.

Friday, 3 December 2010

Show me the numbers

One of the offshoots of the Arcadia Project was the joint UL/CARET CULWidgets product, which wangled some JISC funding to "provide users with services appropriate to a networked world" in a widgetty/web services way.

Our two main production interfaces are the Cambridge Library Widget and the CamLib mobile web app, both soft-launched at the start of this term. This, the last day of term, seems a good time to look back and see how they've done.

Overall unique visitor numbers for the Widget are:

And, slightly more erratically, for CamLib:

A combined 4,489 unique visitors across the two. The services are mainly targeted at Undergraduates, of which Cambridge has c.12,000. Even assuming some crossover between the interfaces, and the likelihood that not all of them are undergrads, we're still looking at a significant proportion of our UG population (25-33%?)

Of course, these are just our unique visitors (i.e. distinct people who have visited the site) - total visits for the period are 12,284 and 2,559 respectively. Which shows that students are coming back to the interfaces again and again, not just taking a crafty peek. Monthly figures across both interfaces average around 3,000 unique users.

Our initial target was for 2,000 unique users in the first term, so we're running at well over double. Well done Widgets! And there are more to come.

Wednesday, 1 December 2010

Disruptive technologies in digitisation

Much of my fellowship has been taken up with examining three tech initiatives, all of which could be used in an on-demand process and could also be classed as disruptive. One is software, two are hardware.

Here is a bit more information ...

1) The Copyright calculator

Public Domain Calculators from Open Knowledge Foundation on Vimeo.

What is it?
A software development that assesses the copyright status of a creative work by looking at associated metadata. I've made an initial attempt to tie the Open Knowledge Foundation calculator into LibrarySearch, or new catalogue interface.

Why is it disruptive?
It can give the reader a useful indication of the copyright status of a book, allowing them to decide how they can re-use it. Its potentially useful as the first stage in a digitisation selection workflow, but also useful on its own. Its also an example of the commoditisation of a basic legal service.

What problems are there?
The be effective, the calculator needs author death-date information. Libraries only record this information when they wish to differentiate a name. Linked data tying a record into other sources of information could help overcome this

2) Kirtas book-scanner

What is it?
An automatic book-scanner. Turns pages using a vacuum equipped robot-arm and images pages with dual high-spec cameras

Why is it disruptive?
Books can be scanned and turned into PDF or other documents in a matter of hours, with minimal human interaction required.

What problems are there?
Its not cheap and still not 100% accurate. Its also a robot, so should not entirely be trusted.

3) Espresso book machine

What is it?
A photocopy sized book creation machine that does not require a printing-works to run it. Can print and bind a book in minutes.

Why is it disruptive?
Provides a library or a bookstore with a massive research collection/ back catalogue with none of the storage problems or overheads . Could have implications on acquisition, collection development and every part of library activity.

What problems are there?
As with the Kirtas, its not cheap, and limited in formats and outputs. And its a robot.

I'll have more to say on my project as I slog through write-up ...

Futurebook 2010

Whilst beginning to wrap up my fellowship (more in another post), I took time out to attend the FutureBook conference yesterday. Organised by the Bookseller, this conference brought together a number of industry leaders to highlight their successes and to raise awareness of issues they have faced in digital publishing. It was a fascinating day. For publishers and booksellers alike, it seems the digital revolution has finally arrived. Some highlights:

The Bookseller has conducted a wide survey of the sector to gauge opinion and attitude, with over 2,600 responses. This will be published soon.
One statistic was worth noting, when asked who will gain most from a rise in digital sales, respondents suggested readers, authors, publishers as the most, with booksellers and libraries rated last.
Publishers and booksellers had differing ideas regarding how quickly the change would occur. By the end of 2015, 2/3rds of publishers believed digital sales would account for anywhere between 8-50% of the market. Only just over half of the booksellers polled believed the same thing
Google will enter the online book retail market soon with Google Editions. Rather than tie themselves to a device, they are aiming for a platform agnostic browser and app based model, with all content remaining in the cloud rather than on-devices (although HTML 5 based local storage will be used) It will allow various online retailers and booksellers to build platforms around Google editions.
Tech-startups were suddenly seen as competition by publishers, at least in the app business. In response, much value was placed on publishers' knowledge of markets, talent and trends, as well as the curatorial process of commissioning and editing
Richard Mollet of the Publishers Association talked up the Digital Economy Bill. Formerly in the music industry, he noted that 'rights and copyright make the digital world go round', and argued that the bill was vital in explaining the damage illegal copying had on the creative sectors
Nick Harkaway, an industry commentator agreed in principle, but noted that enforcement so far had failed to deter illegal filesharing and DRM was no serious barrier to rights infringement. He urged publishers to keep people paying by offering serious innovation rather than simple digital recyling of print content
The academic book sector was well represented, with Wiley, CUP, OUP , Ingenta Connect and Blackwells Academic presenting. OUP gave an excellent talk on the changes required across an institution
We also had displays from Scholastic regarding the cross-media Horrible Histories series and looks at the editorial and creative processes behind booksellers' first steps into the world of mobile app development. Max Whitby from Touchpress showed off the Elements app, the next stage in the evolution of the coffee-table book
YouGov have a tablet track scheme looking at customer experiences of iPads and Kindle readers, which produced some interesting facts.

One over-arching trend that libraries can learn from relates to changes in the production and publishing processes. The phrase 'reflow-able text' was heard throughout the day, with publishers being urged to ditch PDF and print centric work flows in favour of granular xml-based marked up text that could be easily re-purposed for the next device of platform.

It seems to me that mainstream publishing is jumping over academic publishing on this one. Given the amount of on-line journal vendors that still insist on forcing PDF files down our throats, XML based delivery of more academic content could be of real use now to the consumer as well as the publisher.

One application of this approach was demonstrated, the Blackwells Academics custom textbooks service. This allows course-leaders to assemble all the material relating to a course into one bound volume which could then be sold on. Taking care of rights clearance, it also quite handily passed on the cost of printing course-packs to students!

Its still a great concept. Such a leap is only possibly by storing content in a normalised XML form, allowing it to be quickly pulled together to create new outputs.

Of course, we've been doing this in libraries with TEI and other transcription initiatives for some time now, but publishing at least is really taking the concept to heart, especially when faced with multiple devices and platforms to support. Post iPad, library digitisation projects will need to bear this delivery model in mind rather than relying upon image based delivery.

Tuesday, 23 November 2010

Analog tools

Sometimes, you just can’t beat an olde-worlde paper notebook. Highly portable, great screen resolution, excellent, intuitive user interface and infinite battery life.

Only problem: it’s hard to back up. On the other hand, it’ll still be readable in 200 years. Which is more than can be said for any of my digital data.

Saturday, 20 November 2010

Digital Humanities

The New York Times has an interesting piece about the renewal of interest in the Digital Humanities.

The next big idea in language, history and the arts? Data.

Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.

The article goes on to describe a few interesting projects. For example:

In Europe 10 nations have embarked on a large-scale project, beginning in March, that plans to digitize arts and humanities data. Last summer Google awarded $1 million to professors doing digital humanities research, and last year the National Endowment for the Humanities spent $2 million on digital projects.

One of the endowment’s grantees is Dan Edelstein, an associate professor of French and Italian at Stanford University who is charting the flow of ideas during the Enlightenment. The era’s great thinkers — Locke, Newton, Voltaire — exchanged tens of thousands of letters; Voltaire alone wrote more than 18,000.

“You could form an impressionistic sense of the shape and content of a correspondence, but no one could really know the whole picture,” said Mr. Edelstein, who, along with collaborators at Stanford and Oxford University in England, is using a geographic information system to trace the letters’ journeys.

He continued: “Where were these networks going? Did they actually have the breadth that people would often boast about, or were they functioning in a different way? We’re able to ask new questions.”

One surprising revelation of the Mapping the Republic of Letters project was the paucity of exchanges between Paris and London, Mr. Edelstein said. The common narrative is that the Enlightenment started in England and spread to the rest of Europe. “You would think if England was this fountainhead of freedom and religious tolerance,” he said, “there would have been greater continuing interest there than what our correspondence map shows us.”

Saturday, 13 November 2010

Hacking the Library -- ShelfLife@Harvard

What is Shelflife?
Shelflife is web application that uses what libraries know (about books, usage and comments) to allow researchers and scholars to access the riches of Harvard’s collections through a simple search.

Researchers will be able to access, read about, and comment on works using common social net- work features. ShelfLife will bring Harvard results to the forefront of the research process, allowing users to easily access and explore our vast collections.
What makes it unique?

Shelflife is designed to help you find the next book. Each search will retrieve a unique web page providing key information about the thing searched, including basic information, fluid links to related neighborhoods, and analytic data about use, all presented in a clean graphical format with intuitive navigation with discoverability in mind.

From the Harvard Library Innovation Lab. The site provides no information about ShelfLife beyond the above, but Ethan Zuckerman, who's a Berkman Fellow at the moment, has a useful blog post reporting a presentation by David Weinberger and Kim Dulin, who co-direct the project.

Libraries tend to be very knowledgeable about what they hold in their collections. But they’re much less good about helping people discover that information. There are few systems like Amazon or Netflix recommendations that help scholars and researchers discover the good stuff within libraries. Dulin argues that librarians have been pretty passive in the face of new technology – they’ve purchased fairly primitive systems and had to buy back their content from the companies who build those systems.

Researchers tend to start with Google, Dulin tells us. They might move to Google Books or Amazon to find out more about a specific book. And perhaps a library will come into play if the book can’t be downloaded or purchased inexpensively. Libraries would like to move to the front of that process, rather than sitting passively at the end. And lots of libraries are trying to take on this challenge – new librarians often come out of school with skills in web design and application development.

The Lab hopes to bring fellows into the process, much as Berkman does. It works to build software, often proof of concept software. And innovation happens on open systems and standards, so libraries and other partners can adopt the technology they’re developing.

Two major projects have occupied much of the Lab’s time – Library Cloud and ShelfLife, both of which Weinberger will demo today. There are smaller applications under development as well. Stackview allows the visualization of library stacks. Check Out the Checkouts lets us see what groups of users are borrowing – what are graduate divinity students reading, for instance. And a number of projects are exploring Twitter to share acquisitions, checkouts and returns.

Weinberger explains that ShelfLife is built atop Library Cloud, a server that handles the metadata of multiple libraries and other educational institutions and makes that metadata available via API requests and “data dumps”. Making this data available, Weinberger hopes, will inspire new applications, including ones we can’t even imagine. ShelfLife is one possible application that could live atop Library Cloud. Other applications could include recommendation systems, perhaps customized for different populations (experts, versus average users, for instance.)

Turns out the ShelfLife is in a pre-Alpha state of development. The metaphor behind it is the "neighbourhood" -- i.e. clusters that a given book might sit within.

We see a search for “a pattern language”, referring to Christopher Alexander’s influential book on architecture and urban design. We see a results page that includes a new factor – a score that indicates how appropriate a title is for the search. We can choose any result and we’ll be brought into “stack view”, where we can see virtual books on a shelf as they are actually sequenced on the physical shelf. Paul explains that it’s actually much more powerful than that – many books at Harvard are in a depository and never see the light of a shelf. And many colelctions have their own special indices – the virtual shelf allows a mix of the Library of Congress categories with other catalogs.

The system uses a metric called “shelfrank” to determine how the community has interacted with a specific book. The score is an aggregate of circulation information for undergraduates, graduates and faculty, information on whether the book has been assigned for a class, placed on reserve, put on recall, etc. That information exists in Library Cloud as a dump from Harvard’s HOLLIS catalog system – in the future, the system might operate using a weekly refresh of circulation data. The algorithm is pretty arbitrary at this point – it’s more a provocation for discussion than a settled algorithm.

Ethan reports some of the Q&A and generally does a great job of writing up the event. His post is worth reading in full.

A systems view of digital preservation

The longer I've been around, the more concerned I become about long-term data loss -- in the archival sense. What are the chances that the digital record of our current period will still be accessible in 300 years' time? The honest answer is that we don't know. And my guess is that it definitely won't be available unless we take pretty rigorous steps to ensure it. Otherwise it's posterity be damned.

It's a big mistake to think about this as a technical problem -- to regard it as a matter of bit-rot, digital media and formats. If anything, the technical aspects are the trivial aspects of the problem. The really hard questions are institutional: how can we ensure that there are organisations in place in 300 years that will be capable of taking responsibility for keeping the archive intact, safe and accessible?

Aaron Schwartz has written a really thoughtful blog post about this in which he addresses both the technical and institutional aspects. About the latter, he has this to say:

Recall that we have at least three sites in three political jurisdictions. Each site should be operated by an independent organization in that political jurisdiction. Each board should be governed by respected community members with an interest in preservation. Each board should have at least five seats and move quickly to fill any vacancies. An engineer would supervise the systems, an executive director would supervise the engineer, the board would supervise the executive director, and the public would supervise the board.

There are some basic fixed costs for operating such a system. One should calculate the high-end estimate for such costs along with high-end estimates of their growth rate and low-end estimates of the riskless interest rate and set up an endowment in that amount. The endowment would be distributed evenly to each board who would invest it in riskless securities (probably in banks whose deposits are ensured by their political systems).

Whenever someone wants to add something to the collection, you use the same procedure to figure out what to charge them, calculating the high-end cost of maintaining that much more data, and add that fee to the endowments (split evenly as before).

What would the rough cost of such a system be? Perhaps the board and other basic administrative functions would cost $100,000 a year, and the same for an executive director and an engineer. That would be $300,000 a year. Assuming a riskless real interest rate of 1%, a perpetuity for that amount would cost $30 million. Thus the cost for three such institutions would be around $100 million. Expensive, but not unmanageable. (For comparison, the Internet Archive has an annual budget of $10-15M, so this whole project could be funded until the end of time for about what 6-10 years of the Archive costs.)

Storage costs are trickier because the cost of storage and so on falls so rapidly, but a very conservative estimate would be around $2000 a gigabyte. Again, expensive but not unmanageable. For the price of a laptop, you could have a gigabyte of data preserved for perpetuity.

These are both very high-end estimates. I imagine that were someone to try operating such a system it would quickly become apparent that it could be done for much less. Indeed, I suspect a Mad Archivist could set up such a system using only hobbyist levels of money. You can recruit board members in your free time, setting up the paperwork would be a little annoying but not too expensive, and to get started you’d just need three servers. (I’ll volunteer to write the Python code.) You could then build up the endowment through the interest money left over after your lower-than-expected annual costs. (If annual interest payments ever got truly excessive, the money could go to reducing the accession costs for new material.)

Any Mad Archivists around?

Worth reading in full.

LATER: Dan Gillmor has been attending a symposium at the Library of Congress about preserving user-generated content, and has written a thoughtful piece on Salon.com about it.

The reason for libraries and archives like the Library of Congress is simple: We need a record of who we are and what we've said in the public sphere. We build on what we've learned; without understanding the past we can't help but screw up our future.

It was easier for these archiving institutions when media consisted of a relatively small number of publications and, more recently, broadcasts. They've always had to make choices, but the volume of digital material is now so enormous, and expanding at a staggering rate, that it won't be feasible, if it ever really was, for institutions like this to find, much less, collect all the relevant data.

Meanwhile, those of us creating our own media are wondering what will happen to it. We already know we can't fully rely on technology companies to preserve our data when we create it on their sites. Just keeping backups of what we create can be difficult enough. Ensuring that it'll remain in the public sphere -- assuming we want it to remain there -- is practically impossible.

Dan links to another thoughtful piece, this time by Dave Winer. Like Aaron Schwartz, Dave is concerned not just with the technological aspects of the problem, but also with the institutional side. Here are his bullet-points:

1. I want my content to be just like most of the rest of the content on the net. That way any tools create to preserve other people's stuff will apply to mine.

2. We need long-lived organizations to take part in a system we create to allow people to future-safe their content. Examples include major universities, the US government, insurance companies. The last place we should turn is the tech industry, where entities are decidedly not long-lived. This is probably not a domain for entrepreneurship.

3. If you can afford to pay to future-safe your content, you should. An endowment is the result, which generates annuities, that keeps the archive running.

4. Rather than converting content, it would be better if it was initially created in future-safe form. That way the professor's archive would already be preserved, from the moment he or she presses Save.

5. The format must be factored for simplicity. Our descendents are going to have to understand it. Let's not embarass ourselves, or cause them to give up.

6. The format should probably be static HTML.

7. ??

Sunday, 7 November 2010

Put not your faith in cloud services: they may go away

From John Dvorak:

I have complained about the fly-by-night nature of these companies for years, but my concern now seems misplaced. I was concerned about operations that you depend on for deep cloud services. This means complex programs running on the cloud with no real alternative. Over time, I've tended to see these companies as more stable than the "Use our free service. You won't regret it!" model.

I was taken to task by numerous vendors who kept telling me that I was full of crap, because cloud services are professionally managed, and nobody could do the job—whatever the job was—better than a room of pros. With the cloud, the pros would also keep the data safe.

Yeah, until they were all laid off, and the service shut down!

Now here's the problem I am experiencing second-hand. The audio podcast I do with Adam Curry, the No Agenda Show (Google it), has been using Drop.io to store podcast album cover images for convenience. They will all be destroyed, as well as the accumulation of links, tips, curiosities, and other valuable information, in the next few weeks.

Looking back on the idea of using this service, I didn't fully consider the ramifications of its discontinuance despite my skepticism about cloud services in general. You know, this was just a lot of weird stuff thrown into a bin. But once it was discontinued, it was apparent what you are left with: dead links.

Wednesday, 27 October 2010

Tales of the Unexpected: an alternative history of the computing industry

Bill Thompson and I will be doing a joint gig tomorrow at the Science Museum in London. All welcome.

The big bad package

Another opinion piece on how technology has changed libraries, this time focusing on the shift to licensed content and its perceived effect on services:

"Libraries are early and enthusiastic adopters of digital innovations. But these innovations bring the values of the marketplace with them. Through innocuous incremental stages, academic libraries have reached a point where they are now guided largely by the mores of commerce, not academe.

Commercialization has impinged on two core facets of university libraries—their collections and their user services. The ownership and provision of research materials, especially academic journals, has been increasingly outsourced to for-profit companies. Library patrons, moreover, are increasingly regarded simply as consumers, transforming user services into customer service. Both developments have distanced libraries from their academic missions."

This fairly damning article comes from Daniel Goldstein, a subject Librarian at UCD writing in the Chronicle of Higher Education. He writes at length about how big package ejournal licensing has eroded the value of traditional library services, removing the specialist librarian as a vital part of academic life and simplifying the services libraries offer:

"By outsourcing ownership to mega-vendors, libraries have introduced the commercial interests of the journal providers into what had been an internal academic transaction between a library and its patrons. Purveyors of e-journals provide access to their titles on sites that are designed to bolster brand recognition and encourage repeat visits. This practice is good for business but not for scholarship. It is common to hear library patrons say that they found information on "Informaworld" (the platform of publisher Taylor and Francis) or "ScienceDirect" (Elsevier's platform) and not to know the name of the journal in which the article was published. Students especially have become purveyor-dependent, when they should be familiarizing themselves with the best literature, in the best journals, regardless of who sells it."

My first thought on this is, does it matter, as long as they get the content they need? But it does. The journal in which an article is published should indicate the authority of the piece based upon the journals' editorial credibility. To some extent, journal vendors are possibly unwittingly eroding the value of peer review. He also warns against the same problem occurring with ebooks:

"It is time, now, to articulate a plan for e-books that better serves the needs of the academic community. University libraries should opt out of the e-book market until it conforms itself to the values, needs, and wallets of academe."

Pretty radical stuff. Its worth taking the time to digest and read. I've often felt that ejournal vendors have forced us to arrange the digital library by publisher, rather than by subject or author. Goldstein also warns against the 'good enough' data that Librarians increasingly fall back on when dealing with digital material. This is also a strong argument, although one I don't always buy. After all, access to full text will surpass even the most well constructed metadata. That said, getting accurate metadata to run a library link resolver remains a real challenge.

There is an interesting response in Library Journal, focusing on open access as an alternative and the problems faced in both getting material available and readers aware of its existence.

Its strikes me that open access for pre-prints, at least in its current institution-centric form does not have all the answers. As Goldstein notes, scholarly publishing is still a legitimate commercial concern, and peer review is a costly process, I can't personally see how open access publishing on an institutional basis could solve the big package problem. Institutional repositories themselves certainly have other vital roles to play, notably that of digital preservation.

A new business model for licensed content may help, moving away from the big deal packages. A colleague who deals far more with this kind of thing recently suggested an interesting alternative to me, which I have since given some thought to:

Social academic platforms such as Menderley are transforming the way academics share citations, and also full text articles, (be it possibly illegally in some cases). Why not simply plug article purchasing from vendors directly into there, but marry that up with shared institutional funds? Academics could purchase articles directly from Menderley, Pubget, Scopus, Web of Knowledge or a library discovery service such as Summon using pools of institutional funds, some or all of which was previously spend on packages.

Once an article has been purchased using shared funds, the full text is then made available to everyone from their institution, either via the vendors' website and/or stored locally on an institutional licensed content server, similar to the LOCKSS and Portico initiatives.

Placing article level selection in the hands of the user is a great idea, allowing collections to grow and diversify according to the academic needs of an institution. The negative issues around big deal purchasing and vendor-exclusive deals could be partly sidestepped. Certain core titles for all disciplines could still be automatically purchased for all by the institution (Nature), and any titles already purchased in perpetuity (such as an archives package) could be added.

We could see two or three purchasing models in operation, rather than just one.

In this 'shared itunes for papers' model, what role is there for the librarian, now selection has been devolved? The above scenario would still need administering financially, with access management and article availability issues to be taken care of, as well as any local storage of content. Its not really so different to the current work of our library's' ejournals team, all that's changed is the selection model.

Monday, 25 October 2010

Libraries without librarians?

The Wall Street Journal has produced a somewhat bleak article focusing on the news that to cut operational costs, U.S. public library services are turning to automated mechanisms, sometimes over staff.

Faced with layoffs and budget cuts, or simply looking for ways to expand their reach, libraries around the country are replacing traditional, full-service institutions with devices and approaches that may be redefining what it means to have a library.

Later this year Mesa, Ariz., plans to open a new "express" library in a strip-mall, open three days a week, with outdoor kiosks to dispense books and DVDs at all hours of the day. Palm Harbor, Fla., meanwhile, has offset the impact of reduced hours by installing glass-front vending machines that dispense DVDs and popular books.

The wave of innovation is aided by companies that have created new machines designed to help libraries save on labor. For instance, Evanced Solutions, an Indianapolis company that makes library software, this month is starting test trials of a new vending machine it plans to start selling early next year.

"It's real, and the book lockers are great," said Audra Caplan, president of the Public Library Association. "Many of us are having to reduce hours as government budgets get cut, and this enables people to get to us after hours."

Whilst it will be a while before a walking robot can successfully guide a reader around the labyrinthine complexities of South Front 3 within the UL, it is interesting to note that this is seen by some as a negative or retrograde step.

"The basis of the vending machine is to reduce the library to a public-book locker," Mr. Lund said in an interview. "Our real mission is public education and public education can't be done from a vending machine. It takes educators, it takes people, it takes interaction."

I don't personally read it that way. Many libraries in Cambridge and the world over already use self-circulation machines to cut costs and make life easier for the reader.

The article seems to be placing a negative cutback-centric spin on a larger growing trend for automating basic library services.

Academic libraries have been doing this kind of thing for years. A self-issue terminal that works with RFI tags in books is arguably a much nicer experience than a 10 minute queue ending in a grumpy Librarian. Ditto with being able to get requested books from an external locker any time you like.

Freeing up staff time for more productive action or interaction (of the reader-educational type perhaps?) other than scanning a barcode and stamping a book is never a bad thing.

From my perspective as an evening Duty Officer within the UL, it would be really nice to be able to have some way to cater for those readers who insist on turning up five minutes before closing with a really complex query. These lockers would not work, so where is the robot that could possibly help here?

Sunday, 24 October 2010

Data mash-ups and the future of mapping

Interesting JISC report published last month. The Summary says (in part):

"This TechWatch report describes the context for the changes that are taking place and explains why the education community needs to understand the issues around how to open up data, how to create mash-ups that do not compromise accuracy and quality and how to deal with issues such as privacy and working with commercial and non-profit third parties. It also shows how data mash-ups in education and research are part of an emerging, richer information environment with greater integration of mobile applications, sensor platforms,e-science, mixed reality, and semantic, machine-computable data and speculates on how this is likely to develop in the future."

Full report (in optimised pdf format) from here.

Saturday, 23 October 2010

Arcadia Lecturer honoured by Electronic Freedom Foundation

James Boyle, who was the first Arcadia Lecturer, has been given a Pioneer Award by the EFF. The awards were established in 1992 to "recognize leaders on the electronic frontier who are extending freedom and innovation in the realm of information technology". The award will be presented at a ceremony in San Francisco on November 8 hosted by Cory Doctorow, who you may remember, gave an Arcadia Seminar in 2009.

Tuesday, 19 October 2010

Introduction ...

I'm Ed Chamberlain, the first fellow for Michaelmas 2010. Whilst I've posted on this blog quite frequently, this is my time doing so as an actual fellow! I've spent the first two weeks of my fellowship investigating digitisation-on-demand services at the University Library, with a view to scoping a potential future service.

The aim would be to offer readers digital delivery of print only material, on demand through the library catalogue.

The project may sound a bit woolly at first, after all libraries have been doing digitisation en-mass for mainstream material and rare stuff for some time, but a reader-driven approach covers a number of areas that have an impact on the future of libraries.

By no means the smallest is that of copyright. In looking for automated solutions to copyright assessment I'm currently taking a look at the wonderful copyright calculator API developed by the Open Knowledge foundation. I'll also be looking to highlight some of the problems that copyright legislation raises for libraries wanting to innovate with new digital services.

Later on I'll be looking and quick wins in non-destructive digitisation of bound material and print-on-demand, hoping to learn from the experiences of those already deploying these services.

Whats interesting for myself, as a Systems Librarian, is that there are no major technical challenges here, its arguably as much about changes in work flow and culture as anything. At the heart of it all is the concept of placing choice over the material digitised firmly in the hands of the reader, rather than a Librarian.

More info about my project can be found on the Arcadia project site, as well as a bit of bio blurb.

Monday, 18 October 2010

Project Highlights for the coming academic year

An outline of what's in the pipeline.

FELLOWSHIPS AND PROJECTS

We will have eight Fellows over the course of the year. The first, Ed Chamberlain, is already beavering away on "Digitisation-on-Demand in academic research libraries". We will have two Fellows working on a radical re-engineering of the info-skills curriculum to make it fit for purpose in a networked scholarly environment, and two working on using technology to support the work of a busy teaching library. And Isla Kuhn will be working on designing a one-day symposium on health-related information (see below for more information).

SEMINARS

We have three interesting speakers booked for the Michaelmas Term.

On November 2 Professor Richard Susskind will talk about "The End of Lawyers?" He is one of the world's leading experts on the impact of information technology on the legal profession and has been an adviser to the Lord Chancellor's Department on IT systems for supporting the administration of justice. Given that librarianship is also an established profession that is being profoundly affected by information technology we thought it would be interesting to hear about his experience with his own profession.

More details at http://talks.cam.ac.uk/talk/index/27313

On November 16 Dr Mark Patterson, Director of Publishing at the Public Library of Science (PLoS) will talk about "Re-engineering the Scholarly Journal". Open Access publishing will be one of the key areas of interest to the academic community in the coming decades and PLoS is a fascinating and successful enterprise in this area. Given that it's on our doorstep, it seemed like a good idea to hear about PLoS's experience so far and their thoughts about the future.

More details at: http://talks.cam.ac.uk/talk/index/27157

On Devember 7 Simon Andrewes will give a talk on the subject of "Changing BBC News: the cultural, managerial and editorial challenges of adapting to a digital environment". Given that adjusting to the needs of a digital environment poses major problems for any established and successful organisation, we were looking for a speaker who could speak from experience. Simon Andrewes was formerly Head of Newsroom development at BBC News and is currently leading a project to deliver and implement a set of online tools to help BBC Journalism collaborate and share more effectively.

More details at: http://talks.cam.ac.uk/talk/index/27548

OTHER EVENTS

The Arcadia Lecture: April 2011

The Arcadia Lecture will be given in April 2011 by Tim O'Reilly, the founder and CEO of O'Reilly Media, and the only publisher who can command the attention of Linux-kernel geeks. Tim has built O'Reilly Media into a leading publisher of computing-related books and a major conference-organising organisation on leading-edge technology issues. He is credited with coining the phrase 'Web 2.0' and is widely recognised as one of the world's most influential commentators on technology issues.

International symposium: health information in a digital world

In 2011 we plan to host a major one-day symposium on how online information and search is impacting on health care. As part of her Arcadia Fellowship, Isla Kuhn, will be working on this. The symposium will focus on most (though perhaps not all: a day may be a long time in politics, but it's just a blink in an academic timescale) of the following areas:

Patients

Practitioners

Researchers

Policy-makers

Mass media

Technology

More details will be available as we have them.

OTHER PROJECTS

Specification and development of an iPhone/Android App

We plan to devote some resources to the development of an original, library-related smartphone App. If you're interested in helping imagine and specify an App, please email me (jjn1 at cam).

The Arcadia Project Book

We will be crowdsourcing a book based largely -- but not exclusively -- on reports by Arcadia Fellows, using an approach pioneered by the Dan Cohen, last year's Arcadia Lecturer.

Thursday, 14 October 2010

Future scenarios for academic librarians

From Phil Davis, writing on the Scholarly Kitchen blog.

The report, Futures Thinking for Academic Librarians: Higher Education in 2025,” sponsored by ACRL, provides nine likely, high-impact scenarios for the future of higher education and the supporting role of librarians. Understanding that universities and their academic libraries take time to adapt to change, the purpose of this study was to start preparing for the likely — or inevitable — future. These scenarios involve:

1. A breaking of the textbook monopoly — creating flexible content that allow pieces to be assembled and allow feedback from users.

2. “I see what you see” — large touch screens that allow for collaborations.

3. “Write here with me” — automated mobile devices that allow students to collect and reference work.

4. Bridging the scholar/practitioner divide — open access publishing and open peer review

5. Stultifying of scholarship — the antithesis of #4 where the current status quo is maintained and strengthened.

6. Everyone is a non-traditional student — moving beyond the 4-year resident college experience.

7. Meet the new freshman class — preparing for the digital divide between in-coming tech-savy students and students with need of remedial technological training.

8. Increasing threat of cybercrime — as campuses lock down their technology, online privacy and intellectual freedom are compromised.

9. “This class brought to you by . . . ” — disaggregated education provided by corporate sponsors.

Enola Gay and digital preservation

Interesting blog post by Danny Bradbury, who made a documentary a few years ago about the cultural history of the nuclear weapons programme. One of the most interesting interviews he had was with Martin Harwit who had been director of the National Air and Space museum in Washington, DC, but was rudely ousted in 1995.

Harwit had attempted to mount an exhibition showcasing the Enola Gay, which was the B52 airplane that dropped the first atomic bomb on Hiroshima. As part of that exhibition, he tried to ask whether the bombing was justified. His approach called down a rain of fire. He was blasted for historical revisionism by the politically powerful veteran community, and the board had little opportunity but to let him go.

The spectre of the Enola Gay's public display caused a tussle before Harwit's moral inquiry even began. The exhibition was mounted because 1995 was the 50-year anniversary of the bombing, and veterans were anxious to see the plane exhibited that year.

Archivists had other ideas. They wanted the job done properly. They knew that in 500 years, when historians examined the aircraft, they might ask an array of arcane, academic questions. For example, what materials were the alloys in specific engine parts comprised of? Investigating minute details such as these and acquiring or rebuilding complex parts for complete veracity takes a great deal of time and effort. They may not have been interesting for veterans who wanted to see their bird fly one last time, but skimping on such tasks for short-term satisfaction puts the whole archival endeavour at risk.

The more I think about the quandary facing archivists preparing the Enola Gay exhibit, the more I worry about our digital existence. Increasingly, our lives are articulated digitally. We share our experiences with others online, and carry out more of our transactions in binary form. The amount of information that we create is accelerating exponentially. "Between the birth of the world and 2003, there were five exabytes of information created," said Google CEO Eric Schimdt recently. "We [now] create five exabytes every two days."

Archiving this stuff is going to be really difficult, in a way that the Enola Gay's archivists couldn't begin to imagine. For one thing, there's the physical media involved. Information may become increasingly stored in the cloud, but it must still be held on physical media, in some data centre somewhere.

Estimates for the longevity of this physical media vary, but all of them point to instability; eventually, data storage decays. It turns out that tape, which is increasingly becoming an archival medium rather than a backup one, is particularly prone to damage because of the way that robotic tape libraries work. In order to access the information that we store today centuries hence, we'll need media that stands the test of time.

New Masters course in Knowledge and Networks

Cathy Davidson at Duke University is designing an intriguing Masters course and has put the outline proposal on the Web for commenting.

Current topic headings are:

Attention: What are the new ways that we pay attention in a digital era? How do we need to change our concepts and practices of attention for a new era? How do we learn and practice new forms of attention in a digital age?
Participation: How do we encourage meaningful interaction and participation? What is its purpose on a cultural, social, or civic level?
Collaboration: Collaboration can simply reconfirm consensus, acting more as peer pressure than a lever to truly original thinking. HASTAC has cultivated the methodology of “collaboration by difference” to inspire meaningful ways of working together.
Network awareness: How we both thrive as creative individuals and understand our contribution within a network of others? How do you gain a sense of what that extended network is and what it can do?
Global Consciousness: How does the World Wide Web change our responsibilities in and to the world we live in?
Civic Responsibility: How we can be good citizens of the Internet when we are off line, working towards real goals in our communities and using the community practices of sharing, customizing, and contributing online towards responsible civic action off line?
Design: How is information conveyed differently, effectively, and beautifully in diverse digital forms? How do we understand and practice the elements of good design as part of our communication and interactive practices?
Narrative, Storytelling: How do narrative elements shape the information we wish to convey, helping it to have force in a world of competing information?
Procedural Literacy: What are the new tactics and strategies of interactive games, where the multimedia narrative forms changes because of our success or failure?
Critical consumption of information: Without a filter (editors, experts, and professionals), much information on the Internet can be inaccurate, deceptive, or inadequate. How do we learn to be critical? What are the standards of credibility?
Digital Divides, Digital Participation: What divisions still remain in digital culture? Who is included and who excluded? How do basic aspects of economics and culture dictate not only who participates in the digital age but how we participate?
Ethics: What are the new moral imperatives of our interconnected age?
Advocacy: How do we turn collaborative, procedural thinking on line into activism in the real world?
Preservation: What are the requirements for preserving the digital world we are creating? Paper lasts. Platforms change.
Sustainability: What are the metrics for sustainability in a world where we live on more kilowatts than ever before? How do we protect the environment in a plugged-in era?
Learning, Unlearning, and Relearning: Alvin Toffler has said that, in the rapidly changing world of the twenty-first century, the most important skill anyone can have is the ability to stop in one’s tracks, see what isn’t working, and then find ways to unlearn old patterns and relearn how to learn.

Thanks to John Connell for the link.

This is an interesting venture, and it has some useful echoes for us -- especially given that we will have two Arcadia Fellows in the Easter Term working on designing a new curriculum for information skills.

Wednesday, 22 September 2010

Concept books ...

Why should cars get all the fancy concept models? Here are a few for the book:

The Future of the Book. from IDEO on Vimeo.

Via the wonderful Gizmodo. As noted in the comments there, most of the hardware to do this already exists, and many iPad apps are not too far away. What is impressive here is that elements of the work have been rethought to work in the digital medium. The none-sequential novel with parallel chapters accessable through motions is a fascinating idea. Interesting to think that a change in delivery format could affect the way literature is written.

Back to earth, here is a great example of taking something dry like a Library annual report and making it interesting and accessible with digital media.

Wednesday, 1 September 2010

How many unique papers in Mendeley?

Interesting blog post by Duncan Hull (who works on the Genome Campus in Hinxton):

Mendeley is a handy piece of desktop and web software for managing and sharing research papers [1]. This popular tool has been getting a lot of attention lately, and with some impressive statistics it’s not difficult why. At the time of writing Mendeley claims to have over 36 million papers, added by just under half a million users working at more than 10,000 research institutions around the world. That’s impressive considering the startup company behind it have only been going for a few years. The major established commercial players in the field of bibliographic databases (WoK and Scopus) currently have around 40 million documents, so if Mendeley continues to grow at this rate, they’ll be more popular than Jesus (and Elsevier and Thomson) before you can say “bibliography”. But to get a real handle on how big Mendeley is we need to know how many of those 36 million documents are unique because if there are lots of duplicated documents then it will affect the overall head count.

He then does an experiment, with intriguing results.

Thanks to Lorcan for the link.

Saturday, 21 August 2010

The psychological disorder of the filing system

From the title poem of George Szirtes excellent 2009 collection The Burning of the Books:

Librarian of the universal library, have you explored

The shelves in the stockroom where the snipers are sitting,

The repository of landmines in the parking bay,

The suspicious white powder at the check-out desk,

The mysterious rays bombarding you by the photocopier,

The pyschological disorder of the filing system

That governs the paranoid republic of print

In the wastes of the world?

Hoping that the "universal library" is not a veiled reference to the UL (we hold a collection of Szirtes' letters so I imagine he feels kindly towards us), library "filing systems" are often considered confusing, if not "psychologically disordered", by library users.

The Wikipedia entry tells us that "Classification systems in libraries generally play two roles. Firstly they facilitate subject access by allowing the user to find out what works or documents the library has on a certain subject. Secondly, they provide a known location for the information source to be located (e.g. where it is shelved)".

Perhaps these dual roles are at the heart of the confusion. Unsurprisingly they often prove to be incompatible - subject groupings being countermanded by physical factors like size and space. And for many works, subject classification decisions can seem arbitrary to a library (or indeed bookshop) user. In any case, the majority of bibliographic records contain headings which are able to record subject information with far greater complexity than a single call number.

I have argued elsewhere that users approach our catalogues knowing exactly what they want. If discovery isn't something which normally happens when browsing, then all a classification scheme needs to do is assign a fairly unique identifier to each item and provide a map showing the physical organisation of these identifiers. This is already the case for closed-access collections where browsing is not a factor.

What about classification for online resources, which don't have a physical location? Many libraries either assign a generic call number for electronic material, or don't assign one at all. Discovery is handled mainly through search, and any subject-driven browse facility is generated by the subject headings within the record itself.

Which is all very well in the catalogue, but what if we need to group online resources for other reasons? One of the things we're looking into is pushing subject-relevant content to students. In order to do this we need to "classify" these resources by subject. Some options are:

Use existing subject headings in record - BUT these can vary enormously in terms of "width" and "depth" (i.e. how many and how specific they are). They are often assigned by libraries with the nature of their collections in mind - if you only have 10 physics books, "physics" is fine, if you have 10,000 you'll need to be more specific. Can we be sure that subject headings have been applied in a consistent manner across our online resources?
Tap into existing forms of recommendation such as reading lists/citations and reflect them in the catalogue - seems the ideal solution BUT difficult to get your hands on the data and link it through to bibliographic records - an "expensive" option.
Use circulation data to make recommendations (i.e. "other people on your course borrowed this, other people who borrowed this borrowed that") BUT difficult to track for online material which is not "borrowed" as such - can we easily track access in the same way?

There are wider questions - does this kind of recommendation produce a concentration on core texts at the expense of wider exploration of the collection? Or, handled correctly, might it lead people into areas they would not have otherwise considered?

There are plenty of questions, and I for one don't have any real answers. Perhaps if I hang around by the photocopiers the "mysterious rays" might spark some inspiration ...

Saturday, 31 July 2010

Blogging as "autosave for our entire culture"

Interesting talk by Scott Rosenberg (one of the Salon pioneers), who has written a useful history of blogging.

Thursday, 29 July 2010

One way street to the iPad, paywalls and linked data

Writing in 1925, Walter Benjamin senses a radical change, not just in the physical forms which contain writing, but also in the nature of writing itself:

"Just as this time is the antithesis of the Renaissance in general, it contrasts in particular to the situation in which the art of printing was discovered. For whether by coincidence or not, its appearance in Germany came at a time when the book in the most eminent sense of the word, the book of books, had through Luther's translation become the people's property. Now everything indicates that the book in this traditional form is nearing its end."

This from a section entitled "Attested Auditor of Books" in Benjamin's brilliant collection One-Way Street, written in his usual aphoristic (blogging?) style (from which I will, with apologies, quote extensively). The internet has also been credited with allowing cultural output to become "the people's property", and leading to important changes in the forms which that output takes.

Benjamin isn't talking about the internet, but he is talking about a change in the form of "print", driven and shaped by economic and technological forces:

"Printing, having found in the book a refuge in which to lead an autonomous existence, is pitilessly dragged out onto the streets by advertisements and subjected to the brutal heteronomies of economic chaos."

So there are further parallels. The production of text, once controlled by publishers (in the broadest sense), is now subject to different forces - the kind of economic chaos Rupert Murdoch is trying to tame with his paywall? Perhaps.

There follows a lovely passage about the perpendicularity of text:

"If centuries ago it began to gradually lie down, passing from the upright inscription to the manuscript resting on sloping desk before finally taking to bed in the printed book, it now begins just as slowly to rise from the ground. The newspaper is read more in the vertical than the horizontal plane, while film and advertisement force the printed word entirely into the dictatorial perpendicular."

There's no argument that electronic content has, in the past, been mainly consumed on upright monitors in the "dictatorial perpendicular". There has been a lot of toing-and-froing over how effectively e-readers and the like can mimic "real" books (screen brightness, electronic ink) - I wonder how much thought has been given to the angle of reading and how it affects our consumption of print. Do the Kindle and the iPad herald an era when texts will once again cosily recline on their beds?

(Lacking either an iPad or a Kindle, I attempted to mimic "horizontal reading" by laying my monitor flat on the desk - don't try this at home. And yes, it does seem to make an immediate difference to one's attitude to the text, at least for this reader.)

If all this seems a bit prophetic - how about this for child/internet anxiety, 1925-style?

"... before a child of our time finds his way clear to opening a book, his eyes have been exposed to such a blizzard of changing, colourful, conflicting letters that the chances of his penetrating the archaic stillness of the book are slight"

It might sound like Benjamin is fondly harking back to this "archaic stillness" - but read on:

"... the book is already, as the present mode of scholarly production demonstrates, an outdated mediation between two different filing systems. For everything that matters is to be found in the card box of the researcher who wrote it, and the scholar studying it assimilates it into his own card index."

Not content with scholarly research databases, Benjamin goes straight onto linked data:

"It is quite beyond doubt that the development of writing will not infinitely be bound by the claims to power of a chaotic academic and commercial activity; rather, quantity is approaching the moment of a qualitative leap when writing ... will take sudden possession of an adequate factual content."

So words will not just carry the "ordinary" meaning that they have in text, but will be imbued with further meaning by the nature of their representation. Anyone who has worked with XML, let alone RDF (a "method for the conceptual description and modelling of information") will find this kind of thing familiar.

Signing off, Benjamin says that "poets" will be the new masters of language, implying that an understanding of the deep and various meaning of words will once again become of prime importance in a new system of communication. Perhaps we all need to be poets in the modern world. Again, his words seem prophetic:

"With the foundation of an international moving script they [poets] will renew their authority in the life of peoples, and find a role awaiting them in comparison to which all the innovative aspirations of rhetoric will reveal themselves as antiquated daydreams."

In the jargon of the poetry workshop, or equally of the linked data practitioner "don't tell - show!"

Anyone in search of a real treat could do worse than track this book down - if only for the next section, subtitled "Principles of the Weighty Tome, or How to write Fat Books" - point II as an example:

"Terms are to be included for conceptions that, except in this definition, appear nowhere in the book"

Or the wonderful, and thankfully, online Writer's Technique in Thirteen Theses

"Consider no work perfect over which you have not once sat from evening to broad daylight"

So much for Benjamin the blogger!

Wednesday, 7 July 2010

Context and meaning in search

My 18 month old son has just discovered sentences - or, more specifically one sentence - "Where's it gone?" (pronounced as one word - "Wezzigonn?" ). He throws his ball into the bushes. He looks at me dolefully and says "Wezzigonn?". I retrieve the ball. He throws it into the bushes. He looks at me dolefully and says ... well, you get the picture. Essentially, he thinks "Where's it gone?" is a single word which means "Fetch the ball, Dad". And the interesting thing is that in the context of the "game" I understand exactly what he means.

In Wittgenstein's Philosophical Investigations he posits a "language game" involving a builder and his assistant. Every time the builder needs another slab he shouts "Slab!" and his assistant duly brings him a slab. So the single word "Slab!" functions as the sentence "Bring me a slab!" in this context.

(He also discusses a variation where every time the builder needs a slab he calls out "Bring me a slab!". An observer who doesn't speak the same language assumes that "Bring me a slab!" is the word for "slab" and when building a wall himself calls to his assistant "Pass me a 'bring-me-a-slab'!" These kind of misunderstandings are commonly found in placenames - such as Bredon Hill, meaning "Hill Hill Hill".)

I suppose the point of all this is that context gives language meaning. Which is all very well if you're building a house, buying some cabbages, throwing a party etc. But what about when you "speak" into a search box. Where do you get your context from then? Do you have to play around with clever search modifiers so the interface understands that when you search Google Images for "bondage" you're looking for pictures of serfs?

Not entirely - at least not in Google and not in Amazon, which both use contextual information to give you relevant results. [In]famously, these interfaces give very different results depending on whether you are logged in. People get pretty uncomfortable with the whole idea of this 'contextual information' - how it's gathered, where it's stored. But the use of contextual information to give language meaning is an essential part of communication.

Libraries have thousands of users and millions of resources crying out to be introduced to each other. But our search mechanisms tend to be context-less. Lorcan Dempsey has said Discovery happens Elsewhere. For users of university libraries, discovery happens in context-heavy environments such as reading lists, citations, seminars, lectures. By the time they get to our interfaces they know exactly what they want. Then they find it (or don't).

Can we start to build some context into our own systems? And what kind of context would be useful? We can say straight off that for students the most useful context is what course they're doing (something we will soon have access to, and which I've blogged about elsewhere). If we also have access to course materials (i.e. reading lists) we can really start to provide useful context for searches. How about if we have access to the content of books and articles - in particular the citations they contain? Could we start to put our searches in the context of a scholarly network based on citation?

Or do we run the risk of second-guessing what users are searching for, and getting it wrong? There are endless anecdotes about people changing their relationship status on Facebook and immediately being bombarded with ads for wedding planners/speed dating. If people change course do we start serving up different results? And if discovery happens in context, how far should the library go in providing context, and how much should it leave to others?

PS Emma has pointed out that little Henry is playing the Fort-Da game http://www.cas.buffalo.edu/classes/eng/willbern/BestSellers/Catcher/FortDa.htm and that along with Lorcan and Wittgenstein, Freud could also be added to the list of tags!

Tuesday, 29 June 2010

Library Futures - A Surfeit of Future Scenarios

Earlier today, I came across an ACRLog post entitled Add Cyberwar Contingencies To Your Disaster Plan that includes a couple of links to reports on possible futures/scenarios that can help in planning future library needs and services:

- Futures Thinking for Academic Librarians, an announcement post for an ACRL report on “Futures Thinking for Academic Librarians: Higher Education in 2025”
- 2010 top ten trends in academic libraries, a "review of the current literature" by the ACRL Research, Planning and Review Committee.

These in turn reminded me of a couple of scenarios I'd heard that had been developed for JISC et al's Libraries of the Future project, and a quick dig around turned them up here: Libraries of the Future - Outline Scenarios and Backcasting

The Libraries of the Future project has identified three scenarios:

- Wild West, introduced as follows:

2050 is an era of instability. Governments and international organisations devote much of their time to environmental issues, aging populations and security of food and energy, although technology alleviates some of the problems by allowing ad hoc arrangements to handle resource shortages and trade. In this environment, some international alliances prosper but many are short term and tactical. The state no longer has the resources to tackle inequality, and is, in many cases, subservient to the power of international corporations and private enterprise.

The challenges of the 21st century have created major disruptions to academic institutions and institutional life. Much that we see as the role of the state in HE today has been taken over by the market and by new organisations and social enterprises, many of them regional.

- Beehive, introduced as follows:

The need for the old European Union countries to maintain their position in the world and their standard of living in the face of extensive competition from Brazil, Russia, India and China (BRIC) has led to the creation of the European Federation (EF) under the treaty of Madrid in 2035. The strength of the EF has meant that values in the EF have remained open in the long tradition of western democracy and culture.

In the years leading up to 2050 the world became increasingly competitive; the continuing economic progress of the BRIC countries and their commitment to developing high quality HE systems means that even high-tech jobs are now moving from the West. On a worldwide scale, and in the US, UK and Europe especially, employer expectations now dictate that virtually all skilled or professional employment requires at least some post-18 education. In the UK these drivers have resulted in a state-sponsored system that retains elements of the traditional university experience for a select few institutions while the majority of young people enter a system where courses are so tightly focused on employability they are near-vocational.

- Walled Garden, introduced as follows:

Following the global recession of the early 21st century cuts in investment levels to help reduce the national deficit meant that internationally, the UK’s influence waned and it became ever more isolated. Indeed the UK drifted from the EU, particularly after the Euro collapsed in the century’s second global recession, and the UK itself fragmented as continued devolution turned to separation and independence. Fortunately, the home nations have achieved reasonable self-sufficiency.

Technological advances, whilst allowing some of the challenges faced earlier in the century to be overcome, has also brought its
problems. The ability for people to connect with like-minded individuals around the world has led to an entrenchment of firmly held beliefs, closed values and the loss of the sense of universal knowledge. This has resulted in a highly fragmented HE system, with a variety of funders, regulators, business models and organisations that are driven by their specific values and market specialisation. However, ‘grand challenges’ of national importance goes some way to galvanising the sector.

The ACRL 20205 report identifies 26 possible scenarios (?! - I thought the idea of scenario planning was to identify a few that covered the bases between them?!), with a range of probabilities of them occurring, their likely impact, and their "speed of unfolding" (immediate change, short term (1-3 years), medium term (3-10 years), long term (10-20 years)).

High impact, high probability scenarios include:

- Increasing threat of cyberwar, cybercrime, and cyberterrorism, introduced as:

College/university and library IT systems are the targets of hackers, criminals, and rogue states, disrupting operations for days and weeks at a time. Campus IT professionals seek to protect student records/financial data while at the same time divulging personal viewing habits in compliance with new government regulations. Librarians struggle to maintain patron privacy and face increasing scrutiny and criticism as they seek to preserve online intellectual freedom in this climate.

- Meet the new freshman class, introduced as:

With laptops in their hands since the age of 18-months old, students who are privileged socially and economically are completely fluent in digital media. For many others, the digital divide, parental unemployment, and the disruption of moving about during the foreclosure crisis of their formative years, means they never became tech savvy. “Remedial” computer and information literacy classes are now de rigueur.

- Scholarship stultifies, introduced as:

The systems that reward faculty members continue to favor conventionally published research. At the same time, standard dissemination channels – especially the university press – implode. While many academic libraries actively host and support online journals, monographs, and other digital scholarly products, their stature is not great; collegial culture continues to value tradition over anything perceived as risky.

- This class brought to you by…, introduced as:

At for profit institutions, education is disaggregated and very competitive. Students no longer graduate
from one school, but pick and choose like at a progressive dinner party. Schools increasingly specialize by offering online courses that cater to particular professional groups. Certificate courses explode and are sponsored by vendors of products to particular professions.

The 2010 top trends from the literature review are given in no priotised order as:

- Academic library collection growth is driven by patron demand and will include new resource types.
- Budget challenges will continue and libraries will evolve as a result.
- Changes in higher education will require that librarians possess diverse skill sets.
- Demands for accountability and assessment will increase.
- Digitization of unique library collections will increase and require a larger share of resources.
- Explosive growth of mobile devices and applications will drive new services.
- Increased collaboration will expand the role of the library within the institution and beyond.
- Libraries will continue to lead efforts to develop scholarly communication and intellectual property services.
- Technology will continue to change services and required skills.
- The definition of the library will change as physical space is repurposed and virtual space expands.

What strikes me about all these possible scenarios is that there don't seem to be any helpful tools that let you easily identify and track indicators relating to the emergence of particular aspects of the scenarios, which I think is the last step in the process of scenario development espoused in Peter Schwartz's "The Art of the Long View"?

So for example, OCLC recently released a report called A Slice of Research Life: Information Support for Research in the United States, which reports on a series of interviews with research and research related staff on "how they use information in the course of their research, what tools and services are most critical and beneficial to them, where they continue to experience unmet needs, and how they prioritize use of their limited time." And towards the end of last year, the RIN published a report on Patterns of information use and exchange: case studies of researchers in the life sciences (a report on information use by researchers in the humanities is due out later this year(?), and one for the physical sciences next year(?)...) A report on researchers' use of "web 2.0" tools is also due out any time now...

So, are any of the trends/indicators that play a role in the 2025 scenarios (which are way too fine grained to be useful?) signaled by typical responses in the OCLC interviews or the Research Information Network report(s)?

PS As if all that's not enough, it seems there's a book out too - Imagine Your Library's Future: Scenario Planning for Information Organizations by Steve O'Connor and Peter Sidorko. (If the publishers would like to send me a copy...?! Heh heh ;-)

Friday, 25 June 2010

I've Got Google, Why Do I Need You?

An excellent presentation on how a modern student percieves the way a library works. Its a great reminder on the gap between the web native student experience and the traditional library service. It also has my favourite quote of the week: "The librarian's logic is just as alien to me as the programmers' logic" ...

By way of Angela Fitzpatricks' shiny new blog!

I've Got Google, Why Do I Need You?

View more presentations from Ida Aalen.

Saturday, 19 June 2010

a wealth of reference management

There's a lot going on in the field of reference management tools - especially here in Cambridge.

Reference management tools include all kinds of systems which help you organise references you have found, store the papers they refer to and perhaps annotate them, share the citations with others, cite papers in your own works, and so on. There are some big players in this area - the first two which spring to mind for me are Zotero and Mendeley. Zotero is a Firefox plugin, so it sits within your browsing experience; Mendeley is a website and a downloadable tool, and makes a big effort to connect you to others and recommend other works - the "Last.fm of scholarly work". Both are popular with researchers at the University of Cambridge.

I realised this week that I now know of at least four reference management tools just originating here in Cambridge:

Papers. This is Mac software from Mekentosj, and has recently won an award
iCite. Like Zotero, this is another Firefox plugin
PaperPile, an open source system (GPL) from the EBI, which was first written for Linux
qiqqa, a somewhat unpronouncable name for a Windows application.

It's great to see the local entrepreneurial spirit coming into play in the academic sphere!

These are a tiny fraction of the world of reference management. It's interesting to note that the market can support so many tools. Each has special features which will appeal more to some users than others; some are particularly well suited to one discipline, with better support for their paper types and bibliographic databases. Of course, it's possible to combine two or more tools as part of your scholarly workflows, to get the best bits of each...

Friday, 11 June 2010

The hidden costs of peer review

My OU colleague Martin Weller has done some calculations of the cost of the academic peer-review process.

Peer-review is one of the great unseen tasks performed by academics. Most of us do some, for no particular reward, but out of a sense of duty towards the overall quality of research. It is probably a community norm also, as you become enculturated in the community of your discipline, there are a number of tasks you perform to achieve, and to demonstrate, this, a number of which are allied to publishing: Writing conference papers, writing journal articles, reviewing.

So it's something we all do, isn't really recognised and is often performed on the edges of time. It's not entirely altruistic though - it is a good way of staying in touch with your subject (like a sort of reading club), it helps with networking (though we have better ways of doing this now don't we?) and we also hope people will review our own work when the time comes. But generally it is performed for the good of the community (the Peer Review Survey 2009 states that the reason 90% reviewers gave for conducting peer review was "because they believe they are playing an active role in the community")

It's a labour that is unaccounted for. The Peer Review Survey doesn't give a cost estimate (as far as I can see), but we can do some back of the envelope calculations. It says there are 1.3 million peer-reviewed journals published every year, and the average (modal) time for review is 4 hours. Most articles are at least double-reviewed, so that gives us:

Time spent on peer review = 1,300,000 x 2 x 4 = 10.4 million hours

This doesn't take into account editor's time in compiling reviews or chasing them up, we'll just stick with the 'donated' time of academics for now. In terms of cost, we'd need an average salary, which is difficult globally. I'll take the average academic salary in the UK, which is probably a touch on the high side. The Times Higher gives this as £42,000 per annum, before tax, which equates to £20.19 per hour. So the cost with these figures is:

20.19 x 10,400,000 = £209,976,000

(Some people think the sum involved is much greater than this, btw.)

Martin points out one important implication of this -- that academics are donating over £200 million a year of their time to the peer review process.

"This isn't a large sum when set against things like the budget deficit", he continues,

but it's not inconsiderable. And it's fine if one views it as generating public good - this is what researchers need to do in order to conduct proper research. But an alternative view is that academics (and ultimately taxpayers) are subsidising the academic publishing to the tune of £200 million a year. That's a lot of unpaid labour.

Now that efficiency and return on investment are the new drivers for research, the question should be asked whether this is the best way to 'spend' this money? I'd suggest that if we are continuing with peer review (and its efficacy is a separate argument), then the least we should expect is that the outputs of this tax-payer funded activity should be freely available to all.

And so, my small step in this was to reply to the requests for reviews stating that I have a policy of only reviewing for open access journals. I'm sure a lot of people do this as a matter of course, but it's worth logging every blow in the revolution. If we all did it....

Yep.

Wednesday, 2 June 2010

Death and the Web

Daithí Mac Síthigh of the Law School at UEA came to Lilian Edwards's seminar last night and has posted a really useful account on his blog.

Tuesday, 1 June 2010

Libraries and Games

At the start of the year, the NB column in the Times Literary Supplement ran a "Literary Anniversaries" series "for the benefit of aspiring scribes in search of a subject to tempt a literary editor". Noting the trend for biographies covering connections between apparently unconnected subjects, two notable anniversaries were picked out each week and explored for their combo-blockbuster-biog potential.

There is a real value to the chance juxtaposition of subjects. Collegiate systems such as Cambridge's have long been lauded for encouraging cross-fertilisation. Browsing in a library can be similarly productive - books are pulled from shelves, connections are made, inspiration strikes and great ideas are born (new books sections are particularly fruitful for the browser - the only thing the items have in common being their "newness").

Online library catalogues (or OPACs) are often accused of ironing serendipity out of the system. You approach an OPAC search with a specific need (I want a copy of this book which appears on my reading list/is cited in an article). That need is either fulfilled or not fulfilled. Few OPAC searches are made in expectation that the item will not be held, so the best outcome is that your expectation is met. And the worst is that you are disappointed. But you are seldom surprised or delighted.

One of the topics of conversation at Mashed Libraries Liverpool (as well as the subject of a lightning talk I have since lost my notes to!) was libraries and games. Or, loosely, how to apply techniques from the computer gaming industry in library interfaces. Musing on the subject with Tony Hirst, I was reminded of an interesting and entertaining debate - "Is Google making us Stupid?" - held at Wolfson College last September. At one point conversation slipped from search engines to video games (are they making us evil? are they making our children evil? and stupid?).

Ian Goodyer, Professor of Child and Adolescent Psychology and a member of the panel, pointed out that the really dangerous aspect of computer games is not violent or sexual content but a phenomenon called partial reinforcement, which means doing the same thing again and again and sometimes being rewarded for it - a bit like reverse Russian Roulette. If the action/reward pattern is (or seems) random - as in a fruit machine - the strength of the reinforcement is increased.

What can libraries learn from all this? We certainly don't want our users to become addicted to OPACs (do we?). But a little slice of serendipity and surprise might change the way people use both library catalogues and library collections.

Amazon already provides a kind of semi-serendipity with its "Other people who bought X bought Y" section. So you get what you searched for, and then a little extra which is relevant but might be unexpected. This section intrigues us because sometimes it contains something of real interest, and sometimes it doesn't. Rather than the main search result which is only, rather boringly, what we were looking for. A couple of months ago Dave Patten of Huddersfield gave an Arcadia Seminar on doing this kind of thing with library catalogues, and it's something we're looking to pursue when we get access to CAMSiS course data.

Another angle is to incorporate some kind of complete "randomness" into library searches - along the lines of "hit the search button and sometimes you'll get something really interesting". I'm currently working on the JISC funded CULwidgets project in collaboration with CARET. As well as working on the provision of core services, we said we wanted to do some fun things which illustrated important points about libraries. A while ago I wrote a little web service which provides random results from the UL database. You can try it here:

http://www.lib.cam.ac.uk/api/voyager/newtonRandom.cgi?databases=cambrdgedb

(it's in completely raw XML form at the moment so you'll just see the data - but it could form the basis of something more polished with finding instructions, book covers etc.)

Hit refresh to get a new, random result. Then hit it again. And again.

A little serendipity in searching works particularly well with a collection as huge and diverse as the UL's. Try refreshing the service until you come across something unexpected, intriguing or downright odd - it might not take as long as you think. And it's surprisingly addictive.

I'm not suggesting that a completely random search is likely to be used in anger by students and academics (though if you were writing a column in the TLS you could run an interesting line in "take three random results from the UL and write an essay on them").

But it does illustrate that the way library catalogues are searched could influence the way collections are used, and how a touch of the unexpected could help to spark ideas and open up the wealth and depth of Cambridge's library collections.