The Twitter archive will join the ambitious “Web capture” project at the library, begun a decade ago. That effort has assembled Web pages, online news articles and documents, typically concerning significant events like presidential elections and the terrorist attacks of 9/11, Mr. Raymond said.
The Web capture project already has stored 167 terabytes of digital material, far more than the equivalent of the text of the 21 million books in the library’s collection.
Some online commentators raised the question of whether the library’s Twitter archive could threaten the privacy of users. Mr. Raymond said that the archive would be available only for scholarly and research purposes. Besides, he added, the vast majority of Twitter messages that would be archived are publicly published on the Web.
“It’s not as if we’re after anything that’s not out there already,” Mr. Raymond said. “People who sign up for Twitter agree to the terms of service.”
Knowing that the Library of Congress will be preserving Twitter messages for posterity could subtly alter the habits of some users, said Paul Saffo, a visiting scholar at Stanford who specializes in technology’s effect on society.
“After all,” Mr. Saffo said, “your indiscretions will be able to be seen by generations and generations of graduate students.”
People thinking before they post on Twitter: now that would be historic indeed.
Extending the folium Magic… - A couple of days ago I posted about some IPython magic for embedding interactive maps in Jupyter notebooks. I had a bit more of a play yesterday, and then ...