Richmond Review

Internet Archive Preserves Data from World Wide Web


The Internet Archive, located at 300 Funston Ave., is a San Francisco-based nonprofit digital library with the stated mission of storing information on the World Wide Web. In the photo above, Salem Evans, manager of administration at the Internet Archive, stands by a Victor talking machine, which digitizes old 78-rpm records.

By Judith Kahn

Have you ever wanted to read a past article that appeared in the “New Yorker,”

see a classic silent film, hear an old concert or particular verses from a song

played at a Grateful Dead concert, or possibly read an old series of a particular

comic strip that is out of publication?

Well, all of this information and more is now accessible to the public through the

work of the Internet Archive, located in a former Christian Science Church located

at 300 Funston Ave., in the Inner Richmond.

Universal access to all knowledge is the stated mission of San Francisco’s

Internet Archive, a 501-C non-profit organization founded to build an Internet library

that provides free public access to collections of digitized materials, including

websites, software applications, games, music, movies and nearly three

million public-domain books. It gets much of its funding from foundations,

government grants and donations from individuals.

The archive is committed to preserving and making America’s cultural

heritage accessible. Because of the archive’s work and its commitment, the

past is never dead.

In addition to its archiving function, the archive is an activist organization, advocating

for a free and open Internet. The Internet Archive was founded by

Brewster Kahle in May of 1996. The archive’s content was not available

to the public until 2001, when the archive developed the


The Great Room, located on the second floor of the building,

is where all of the digitalized material is stored. In the past, the room was the

sanctuary of the Christian Science Church. Standing in front are three-foottall

ceramic statues of past and present staff, looking toward the stage.

It is recognized by many educators and the public that libraries exist to preserve

a society’s cultural artifacts and to provide access to them. If libraries are to

continue to foster education and scholarship in this era of digital technology, it is

essential for them to expand, says founder Brewster Kahle. Without cultural

artifacts, civilization has no memory and no mechanism to learn from its successes

and failures.

With the rapid growth of the Internet, the function of the Internet library is essential

to education and the maintenance of an open society. By collaborating with

institutions, including the Library of Congress and the Smithsonian, the

archive is working to preserve records for generations to come.

Stewart Brand, president of the The Long Now Foundation, said “digitized information,

especially on the Internet, has

such rapid turnover these days that total

loss is the norm. Civilization is developing

severe amnesia as a result. The Internet is the beginning of a cure. It is

the beginning of complete, detailed, accessible, and searchable memory for society.”

Protecting stored resources from damage or destruction is an ongoing task,

which is why the archive tries to maintain copies of its collections at multiple sites.

Part of its collection is already handled this way and they are proceeding as

quickly as possible to do the same with the rest. The main issues for the archive

is guarding against the consequences of accidents and data destruction and maintaining

the accessibility of data as formats become obsolete. Realizing the importance

that Internet libraries serve, there is now an effort to promote the formation

of Internet libraries in the United States.

The Internet Archive has data centers in San Francisco, Redwood City and

Richmond. In November of last year, Kahle announced the Internet Archive was building

an Internet Archive in Canada. He feels there is an urgent need to build a

copy of the archive in foreign countries. “On Nov. 9 in America, we woke up to

a new administration promising radical change. It was a firm reminder that institutions

like ours, built for a long time, need to design for change. For us, it

means keeping our cultural materials safe, private and perpetually accessible. It

means preparing for a web that may face greater restrictions in a world in which

government surveillance is not going away. At the Internet Archive, we are

fighting to protect our readers’ privacy in the digital world,” Kahle said.

The archive collects the bulk of its data automatically from its web crawlers,

which work to preserve as much of the public web as possible. A web crawler is

software that looks at and captures web pages, and the WayBackMachine allows

the archives of the World Wide Web to be searched and accessed. This service is important

since it allows one to see what previous versions of websites looked like

and grabs original source code from websites that may no longer be directly available.

The Internet Archive has capitalized on the popular WayBack Machine, whose

name was derived from an old Rocky and Bullwinkle cartoon.

The value and immensity of the archive’s collection is overwhelming.

The archive’s collection includes digitized

books and special collections from various libraries and cultural heritage institutions

from around the world.

The Internet Archive collection operates 33 scanning centers in five countries,

digitizing about 1,000 books a day for a total of more than 2 million books. The

audio archive includes music, audio books, news broadcasts, old-time radio

shows and a wide variety of other audio files. There are more than 200,000 free

digital recordings in the collection and the sub-collections include audio books and

poetry, podcasts and non-English audios. The live music archive includes more than

100,000 concert recordings from established and independent artists. The images

images archive contains more than 880,000 items. The archive has also uploaded

court opinions, legal briefs and exhibits from United States’ federal courts.

The archive has received two new additions to its archive, the Trump

Archives, which includes 700 televised speeches, interviews, debates and other

news broadcasts. It was launched on Jan. 5. Also launched was a private collection

of videotaped television news that spans 35 years on Marion Stoke, a librarian, social

justice advocate and television interview program host who believed it was

vital to preserve television news. Stokes started recording news at home in 1977

and never stopped. It was before her death in December, 2012, that she had

140,000 video cassettes. The digitization of such a huge collection will take a number

of years and require additional funding. On Nov. 6, 2013, a fire at the Internet

Archive’s scanning center, destroyed 1,300 square feet of space that held scanning

equipment worth hundreds of thousands of dollars. Some physical materials

were in the scanning center because they were being digitized, but most were in a

separate locked room in the archive. The fire proved to be a reminder that digitizing

and making copies are good strategies for both access and preservation. Since

the fire, the archive has made and continues to make copies of the data at the

Internet Archive’s multiple locations. Kahle’s enthusiasm to preserve the

Internet Archive, along with his committed staff, will continue so older generations

can revisit the past and future generations can learn about the past they did

not experience.


The Internet Archives contains three-foot tall ceramic statues of past and present staff.

Free tours of the Internet Archive are given every Friday, at 1 p.m., with an informative

guide. To arrange a tour, call (415) 561-6767. For more information,

go to the website at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s