Abstract
The paper examines greenstone digital
library software, its features, installations requirement, potentials and
examples of live digital libraries built with the software around the world. The paper finally provides collections
building method with some examples of different types of collections and
concludes with export of collection to Compact Disk Read Only Memory (CD/ROM).
INTRODUCTION
TO GREENSTONE DIGITAL LIBRARY SOFTWARE
What
are digital libraries? Some definitions
“Digital
Libraries are organized collections of digital information. They combine the
structuring and gathering of information, which libraries and archives have
always done, with digital representation that computers have made possible”. Lesk
(1997)
“Digital
Libraries are organizations that provide the resources, including the
specialized staff, to select, structure, offer intellectual access
to, interpret, distribute, preserve the integrity of, and ensure the
persistence over time of collections of digital works so that
they are readily and economically available for use by defined
community or set of communities”. Digital Library Federation (1998)
A
working definition
A
Digital Library is an organized collection of digital materials with associated
services, accessible over networks.
Note:
Thus, a stream of data coming from the satellite to your computer
is not a digital
Library
such as google data.
What
is Greenstone Digital Library Software (GSDL)?
A
comprehensive open source software suit to build, maintain and distribute
digital library collections. Available at: www.greenstone.org.
It
was developed by the New Zealand Digital Library (NZDL) Project at the
University
of
Waikato. Being distributed and promoted by NZDL, in cooperation with UNESCO and
Human Info NGO, Belgium.
Aims
of GSDL
The
aim of the software is to empower users, particularly in Universities,
Libraries, and other public service institutions, to build their own digital
libraries. Digital Libraries are radically reforming how information is
disseminated and acquired in UNESCO’s partner communities and institutions in
the field of education, science and culture around the world, and particularly
in developing countries. From: www.greenstone.org
Features
of GSDL
Ø
It can be accessed and distributed through the Web and CD-ROMs
Ø
Multi-platform availability in UNIX and Windows.
Ø
Graphical User Interface (GUI) – Greenstone Librarian Interface
(GLI) and
command
based collection building.
Ø
Powerful indexing, i.e. full-text, documents, sectional indexing.
Ø
Metadata based including field-based and automatic indexing.
Ø
Support for Dublin Core and other metadata.
Ø
Powerful search and browsing including full text and field search,
Boolean and
ranking
retrieval.
Ø
Support for several document formats such as text, html, word,
pdf, email etc. that
are
enabled by plug-ins.
Ø
Homepage customization.
Ø
Configurability, multilingual, metadata and classifiers.
Ø
Advanced compression for text and indexes.
Ø
Includes Organizer for the Collections.
Ø
Respect for Copyrights.
Ø
Support::
User
List: greenstone-users@list.scms.waikato.ac.nz.
It
is the most popular mailing list which is archived as a Greenstone collection
at
www.nzdl.org.
Developers
List: Greenstone-devel@list.scms.waikato.ac.nz.
Minimum
Requirements for GSDL Installation
Hardware
-
Disk space 50 MB for binary installation.
-
155 MB for compiling Greenstone from source code
-
200 MB for optional Greenstone demo collections.
- 4
MB for online documentation.
- 24
MB for Greenstone’s CD exporting functions.
Software
-
Java Runtime Environment (JRE)
-
Web Server (Apache recommended).
-
Practical Extraction and Report Language (PERL) for collection building.
-
C++ compiler for source codes.
- Web browser (Netscape and Internet
Explorer recommended
Examples
of Greenstone Digital Libraries in Action
With
Internet connection, you can access the following examples of Greenstone
Library Web Sites by invoking (Control + Click) on the links below.
MOST
Digital Library (UNESCO)
The
MOST Digital Library contains results from research carried out during the
first ten years of the MOST Programme. The themes covered include Drugs,
Globalization and Governance, International Migration, Multicultural Societies,
Poverty Eradication, Social transformations, Sustainability, Urban Development
and HIV/AIDS. The MOST digital library offers a brief abstract of each document
(in English, French and Spanish) that identifies the key issues and arguments
in a few lines. The database contains the documents in their original language
(English, French or Spanish) and in translation if such a version exists. The
user interface is currently in English only, but will shortly also be available
in French and Spanish.
New
Zealand Digital Library Project
A
demonstration site set up by the developers of Greenstone, the New Zealand
Digital Library Project. This site contains many collections, ranging from
humanitarian information to computer science technical reports to demonstration
collections of Chinese and Arabic documents.
The
United Nations Digital Library - Islamabad
The
United Nations Digital Library Islamabad, is an Open-Access, online searchable
repository containing Full-text of documents, reports, publications and other
public information items produced by the country offices of United Nations
Organizations,Programmes and Funds in Pakistan. The Digital Library of United
Nations in Pakistan strives to stop the loss of digital information of the UN
in Pakistan and facilitates the retention and long-term preservation, in a
usable form. The collection comprises the general documents, reports,
publications, newsletters, press releases and other public information items.
This repository is a centralized information resource of the United Nations
information on or about Pakistan.
University
of Namibia
This
library includes a collection of past examination papers of the University of
Namibia, a register of Namibian theses and dissertations, and a collection
containing publications of University of Namibia staff members between 1992 and
2002.
Potentials
of Greenstone Software in Nigeria
Ø
Digitization of books, Journals, Newsletters, Pamphlets,
Monographs, Newspapers
Ø
Digitization of Reports: Annual reports, Management and
Committee’s reports.
Ø
Digitization of important documents such as Laws of the
Federation, Decrees, Nigeria Constitution, Conference Proceedings, etc
Ø
Development of local Music and Video Libraries
Ø
Theses and Dissertation of Universities
Ø
Images of our tourist attractions, productions of Indigenous
Knowledge and lot more
COLLECTION BUILDING AND METADATA
Collection
Building
Collection Building in GSDL is a way of gathering information/materials
that will be used for creating your digital library database.
Metadata
Wikipedia define
Metadata as data providing information about one or more aspects of the
data. National Information Standards
Organization (NISO) gave a comprehensive definition of metadata as structured
information that describes, explains, locates, or otherwise makes it easier to
retrieve, use, or manage an information resource. Metadata is often called data
about data or information about information.
BUILDING A SMALL COLLECTION OF HTML FILES
You will need some HTML
files.
Running the Librarian
Interface (GLI)
·
Start
the Greenstone Librarian Interface:
Start → All
Programs → Greenstone 2.85 (note this depend on the GSDL Version you are running) → Librarian
Interface (GLI)
After a short
pause a startup screen appears, and then after a slightly longer pause the main
Greenstone Librarian Interface appears. (A command prompt is also opened in the
background.)
Starting a New Collection
·
Start
a new collection within the Librarian Interface:
File → New...

·
You
will create a collection based on a few HTML web pages.
A window pops
up. Fill it out with appropriate values—for example,
Collection
title:
Small HTML Collection
Description of content: A small collection of HTML pages
Description of content: A small collection of HTML pages
Leave the
setting for Base this collection on:
at its default: --
New Collection --, and click <OK>.

·
Next,
you will need to gather together the files that will constitute the collection.
A suitable set has been prepared ahead of time in workshop_folders →
simple_html. Using the left-hand side of the Librarian Interface's Gather
panel, interactively navigate to the workshop_folders folder.
Adding documents to the
Collection
·
Now
drag the simple_html folder from the left-hand side and drop it on the
right. The progress bar at the bottom shows some activity. Gradually,
duplicates of all the files will appear in the collection panel.
You
can inspect the files that have been copied by double-clicking on the folder in
the right-hand side.
·
Since
this is our first collection, we won't complicate matters by manually assigning
metadata or altering the collection's design. Instead we rely on default
behaviour. So pass directly to the Create panel by clicking its tab.
Building the Collection
·
To
start building the collection, click on Create
then click the <Build Collection> button.
·
Once
the collection has been successfully built, a window pops up “Collection Creation Result” windows
pop-up will appear. Click <OK> to confirm the creation.
·
Click
the <Preview Collection> button to look at the end result. This
loads the relevant page into your web browser (starting it up if necessary).
Viewing extracted Metadata
·
Back
in the Librarian Interface, click the Enrich tab to view the metadata
associated with the documents in the collection.
·
Presently
there is no manually assigned metadata, but the act of building the collection
has extracted metadata from the documents. Double click the simple_html folder
to expand its content. Then single-click aragon.html to display all its
metadata in the right-hand side of the panel. The initial fields, starting
"dc.", are empty. These are Dublin Core metadata fields for manually
entered data.
·
Use
the scroll bar on the extreme right to view the bottom part of the list. There
you will see fields starting "ex." that express the extracted
metadata: for example ex.Title, based on the text within the HTML Title
tags, and ex.Language, the document's language (represented using the
ISO standard 2-letter mnemonic) which Greenstone determines by analyzing the
document's text.
·
Close
the collection by clicking File → Close. This automatically saves the
collection to disk.
Setting-up Shortcut in the
Librarian Interface
·
To
set up a shortcut to the source files, in the Gather panel navigate to
the folder in your local file space that contains the files you want to use—in
our case, the workshop_folders folder. Select this folder and then
right-click it, and choose Create Shortcut from the menu. In the Name
field, enter the name you want the shortcut to have, or accept the default workshop_folders.
Click <OK>. Close all the folders in the file tree in the
left-hand pane, and you will see the shortcut to your source files.
A SIMPLE IMAGE COLLECTION
·
In
the Librarian Interface (GLI), starts a new collection (File → New...)
called backdrop. Fill out the fields with appropriate information. For Base
this collection on:, select the item Image-e (imagee) from the
pull-down menu.
When
you base a collection on an existing one, it inherits all the settings of the
old one. You won't be asked to choose a metadata set because the new collection
inherits the ones (if any) used by the seed collection.
·
Copy
the images provided in workshop_folders → images into your newly-formed
collection.
·
Change
to the Create panel and build the collection.
·
Preview the result.
·
Click
on in the navigation bar to view a list of the photos ordered by filename and
presented as a thumbnail accompanied by some basic data about the image. The
structure of this collection is the same as Image-e (imagee), but the
content is different.
·
Back
in the Librarian Interface, change to the Enrich panel and view the
extracted metadata for Bear.jpg.
Adding Title and
Description Metadata
·
We
work with just the first three files (Bear.jpg, Cat.jpg and Cheetah.jpg)
to get a flavour of what is possible. First, set each file's dc.Title field
to be the same as its filename but without the filename extension:
Click on Bear.jpg so its metadata
fields are available, then click on its dc.Title field on the righthand
side. Type in Bear.
Repeat the
process for Cat.jpg and Cheetah.jpg.
·
Add
a description for each image as dc.Description metadata.
What description should you enter? To
remind yourself of a file's content, the Librarian Interface lets you open
files by double-clicking them. It launches the appropriate application based on
the filename extension, Word for .doc files, Acrobat for .pdf files and so on.
Double-click Bear.jpg: on Windows,
the image will normally be displayed by Microsoft's Photo Editor (although this
depends on how your computer has been set up).
Back in the Enrich pane, make
sure that Bear.jpg is selected in the collection tree on the left hand
side. Enter the text Bear in the Rocky Mountains as the value for the dc.Description
field.
Repeat this
process for Cat.jpg and Cheetah.jpg, adding a suitable
description for each.
·
Go
to the Create panel and click <Build Collection>. Once it
has finished building, preview the collection. You will not notice
anything new. That's because we haven't changed the design of the collection to
take advantage of the new metadata.
Change Format Features to display new metadata
·
Now
we customize the collection's appearance. Go to the Format panel and
select Format Features from the left-hand list. Leave the feature
selection controls at their default values, so that All Features is
selected for Choose Feature, and VList is selected as the Affected
Component. In the HTML Format String, edit the text as follows:
Change _ImageName_: to Title:
Change [Image] to [dc.Title]
After [dc.Title]<br> add
Description: [dc.Description]<br>
Metadata names are case-sensitive in
Greenstone: it is important that you capitalize "Title" and
"Description" (and don't capitalize "dc").
·
The
new format statement is displayed in the list of assigned format statements.
The first substitution alters the fragment of text that appears to the right of
the thumbnail image, the second alters the item of metadata that follows it.
The addition displays the description after the Title.
·
Preview
the collection by clicking the <Preview Collection> button. When
you click on Browse in the navigation bar the presentation has changed
to "Title: Bear" and so on. Each image's description should appear
beside the thumbnail, following the title.
After the first
three items, the Title and Description become blank because we have only
assigned Dublin Core metadata to these first three. To get a full listing,
enter all the metadata.
Changes in the Format
panel take place immediately and you can see the result straightaway by
clicking the Preview Collection. If you modify anything in the Gather,
Enrich or Design panels, you will need to rebuild the collection.
Changing the Size of Image
Thumbnails
·
Lets
change the size of the thumbnail image and make it smaller. Thumbnail images
are created by the ImagePlugin plug-in, so we need to access its
configuration settings. To do this, switch to the Design panel and
select Document Plugins from the list on the left. Double-click ImagePlugin
to pop up a window that shows its settings. (Alternatively, select ImagePlugin
with a single click and then click <Configure Plugin...> further
down the screen). Currently all options are off, so standard defaults are used.
Select thumbnailsize, set it to 50, and click <OK>.
·
Build and preview the collection.
·
Once
you have seen the result of the change, return to the Design panel,
select the configuration options for ImagePlugin, and switch the thumbnailsize
option off so that the thumbnail reverts to its normal size when the collection
is re-built.
Adding Browsing Classifier
based on Description Metadata
·
Now
we'll add a new browsing option based on the descriptions. In the Design
panel, select Browsing Classifiers from the left-hand list. Set the menu item for
Select classifier to add: to AZList; then click <Add
Classifier...>.
·
A
window pops up to control the classifier's options. Set the metadata
option to dc.Description and click <OK>.
·
Build the collection, and preview it.
Choose the new descriptions link that appears in the navigation bar.
Only
three items are shown, because only items with the relevant metadata
(dc.Description in this case) appear in the list. The original browse list
includes all photos in the collection because it is based on ex. Image,
extracted metadata that reflects an image's filename, which is set for all
images in the collection.
Creating Searchable Index
based on Description Metadata
·
Now
we'll add an index so that the collection can be searched by descriptions.
Switch to the Design panel and select Search Indexes from the
left-hand list. Click the <New Index> button. Select dc.Description from the list of
metadata to include in the index, leave Indexing level: at its default,
"document", and click <Add Index>.
·
Switch
to the Create panel, build the collection, then preview it.
There is now a Search button in the navigation bar. As an example,
search for the term "bear" in the document:dc.Description index
(which is the only index at this point).
·
To
change the text that is displayed for the index (document:dc.Description), go
to the Format panel back in the Librarian Interface. Select Search from
the left-hand list. This panel allows you to change the text that is displayed
on the search form. Change the Display text for the document:dc.Description
index to "descriptions" (or other suitable text). Go back to the
browser and reload the search page. Your new text will appear in the search
form.
A COLLECTION OF WORD AND PDF FILES
You will need some source
files like those in the workshop_folders → Word_and_PDF folder.
·
Start
a new collection called reports (File → New...), base it
on -- New Collection --, and choose Dublin Core as the metadata set.
·
Copy
all the files from workshop_folders → Word_and_PDF → Documents into the
collection. You can select multiple files by clicking on the first one and
shift-clicking on the last one, and drag them all across together. (This is the
normal technique of multiple selections.)
·
Switch
to the Create panel, and build and preview the collection.
Viewing the Extracted
Metadata
·
Again,
this collection contains no manually assigned metadata. All the information
that appears—title and filename—is extracted automatically from the documents
themselves. Because of this the quality of some of the title metadata is
suspect.
·
Back
in the Librarian Interface, click the Enrich tab to view the
automatically extracted metadata. You will need to scroll down to see the
extracted metadata, which begins with "ex.".
·
Check
whether the ex.Title metadata is correct for some of the documents by
opening them. You can open a document from the Librarian Interface by double
clicking on it.
·
The
extracted Title metadata for some documents is incorrect. For example, the
Titles for pdf01.pdf and word03.doc (the same document in
different formats) have missed out the second line. The Title for pdf03.pdf
has the wrong text altogether.
Manually Adding Metadata
·
In
the Enrich panel, manually add Dublin Core dc.Title metadata to
those documents which have incorrect ex.Title metadata. Select word03.doc
and double-click to open it. Copy the title of this document
("Greenstone: A comprehensive open-source digital library software
system") and return to the Librarian Interface. Scroll up or down in the
metadata table until you can see dc.Title. Click in the value box and
paste in the metadata.
·
Now
add dc.Creator information for the same document. You can add more than
one value for the same field: when you press Enter in a metadata value
field, a new empty field of the same type will be generated. Add each author
separately as dc.Creator metadata.
·
Close
the document (in Microsoft Word) when you have finished copying metadata from
it. External programs opened when
viewing documents must be closed before building the collection, otherwise
errors can occur.
·
Next
add dc.Title and dc.Creator metadata for a few of the other
documents.
·
You
will notice as you add more values, they appear in the Existing values for
... box below the metadata table. If you are adding the same metadata value
to more than one document, you can select it from this list. For example, pdf01.pdf
and word03.doc share the same Title; and many documents have common
authors.
If you build and
preview your collection at this point, you will see that the Titles list
now shows your new Titles. However, the dc.Creator metadata is not
displayed. You need to alter the collection design to use this metadata.
Document Plugins
·
In
the Librarian Interface, look at the Document Plugins section of the Design
panel, by clicking on this in the list to the left. Here you can add,
configure or remove plugins to be used in the collection. There is no need to
remove any plugins, but it will speed up processing a little. In this case we
have only Word, PDF, RTF, and PostScript documents, and can remove the ZIPPlugin,
TextPlugin, HTMLPlugin, EmailPlugin, ImagePlugin, PowerPointPlugin,
ExcelPlugin, ISISPlug and NULPlugin plugins. To delete a
plugin, select it and click <Remove Plugin>. GreenstoneXMLPlugin
is required for any type of source collection and should not be removed.
Search Indexes
·
The
next step in the Design panel is Search Indexes. These specify
what parts of the collection are searchable (e.g. searching by title and
author). Delete the ex.Source index, which is not particularly useful,
by selecting it and clicking <Remove Index>.
·
Modify
the ex.Title index to include dc.Title by selecting the index in
the Assigned Indexes box and clicking <Edit Index>. Select dc.Title
from the list of metadata, and click <Replace Index>.
Searching this index will search both dc.Title and ex.Title metadata.
If you want to restrict searching to just the manually added dc.Title metadata,
edit the index again and deselect ex.Title from the list of metadata.
·
You
can add indexes based on any metadata. Add a new index based on dc.Creator by
clicking <New Index>. Select dc.Creator in the list of
metadata, and click <Add Index>.
Browsing Classifiers
·
The
Browsing Classifiers section adds "classifiers," which provide
the collection with browsing functions. Go to this section and observe that
Greenstone has provided two classifiers, AZLists based on ex.Title and
ex.Source metadata. These correspond to the Titles and Filenames
buttons on the collection's access bar.
Remove the ex.Source classifier
by selecting it and clicking <Remove Classifier>.
·
Modify
the ex.Title classifier to use dc.Title instead. Select the
classifier and click <Configure Classifier...>. In the metadata
box, select dc.Title instead of ex.Title. Click <OK>.
·
Now
add an AZCompactList classifier for dc.Creator. Select AZCompactList
from the Select classifier to add drop-down list and click <Add
Classifier...>. A popup window Configuring Arguments appears.
Select dc.Creator from the metadata drop-down list and click <OK>.
AZCompactList is like AZList,
except that values that appear multiple times in the hierarchy are
automatically grouped together and a new node, shown as a bookshelf icon, is
formed.
·
Switch
to the Create panel, and build and preview the collection.
·
Check
that all the facilities work properly. There should be three full-text indexes,
called text, dc.Title (or dc.Title,Title if you didn't
deselect ex.Title in the search indexes step above), and dc.Creator.
The Titles list should display all the documents to which you have
assigned dc.Title metadata (and only those documents). The Creators list
should show one bookshelf for each author you have assigned as dc.Creator,
and clicking on that bookshelf should take you to all the documents they
authored.
Renaming the Search Indexes
·
The
default display text for the indexes in the drop-down list on the search page
contains the content of the index. Now we will change this display text to make
it nicer. Go to the Format panel by clicking its tab. This panel is
split into several sections, each controlling some aspect of collection
presentation.
·
Select
Search in the left hand list. This section allows you to modify what
text is displayed for the drop-down lists in the search form (indexes,
subcollections, levels etc). Set the Display text for the dc.Title (or
dc.Title,Title if you didn't deselect ex.Title in the search
index) index to be "titles", and that for the dc.Creator index
to be "creators". Preview the collection by clicking the Preview
Collection. The search form should display the new text.
Classifying on Multiple Metadata
·
The
new Titles list shows only those documents which have been assigned dc.Title
metadata. For many documents,
extracted Titles may be fine, and it is impractical to add the same metadata
again as dc.Title. Fortunately there is a way we can use both metadata
types in one classifier: specify a list of metadata names in the classifier.
·
In
the Browsing Classifiers section of the Design panel, select the AZList
for dc.Title in the Assigned Classifiers box and click <Configure
Classifier...>. Note you can achieve the same result by double clicking
on the classifier.
·
In
the metadata field, type ",ex.Title" after the
"dc.Title"—i.e. make it read
dc.Title,ex.Title
·
If
you have already done the Enhanced Word document handling exercise, some
of the documents will have extracted ex.Creator metadata, and some will have
dc.Creator. To use both of these in the Creators classifier, make a similar
change to the AZCompactList: make the metadata field read
dc.Creator,ex.Creator.
Build the collection again and preview
it. Now all of the documents should appear in the Titles list (and
extracted Creators should appear in the Creators list).
EXPORTING A
COLLECTION TO CD-ROM/DVD
To publish a
collection on CD-ROM or DVD, Greenstone's Export to CD-ROM export module must
be installed. This is included with CD-ROM distributions, and all distributions
2.70w and later. It must be installed separately for non-CD-ROM versions of
Greenstone, version 2.70 and earlier (see Installing Greenstone).
1. Launch the
Greenstone Librarian Interface if it is not already running.
2. Choose File
à
Write CD/DVD image.... In the resulting popup window, select the
collection or collections that you wish to export by ticking their check boxes.
You can optionally enter a name for the CD-ROM: this is the name that will
appear in the menu when the CD-ROM is run. If a name is not entered, the
default Greenstone Collections will be used. You can also specify
whether the resulting CD-ROM will install files onto the host machine when used
or not. Click <Write CD/DVD image> to start the export
process. The necessary files for export
are written to:
Greenstone à tmp à exported_xxx
where xxx will
be similar to the name you have entered. If you didn't specify a name for the
CDROM, then the folder name will be exported_collections.
You need to use
your own computer's software to write these on to CD-ROM. On Windows XP this
ability is built into the operating system: assuming you have a CD-ROM or DVD
writer insert a blank disk into the drive and drag the contents of exported_xxx
into the folder that represents the disk.
REFERENCES
1.
Allison, Zhange. (2003) Customizing the Greenstone user interface: An
illustrated guide to
Customizing
the Greenstone user Interface. Washington, D.C. Research
Library consortium
2. Building
digital Collections: Technical Information and Background Paper.(2000)
National
Library
Program (NDLP) at the Library of Congress.
3.
Cornell University Library/Research Departments. 2000. Moving theory into
practice: digital
Image
for libraries and archives. Research Libraries Group.
Available at
http://www.library.cornell.edu/preservation/tutorial
4.
Digital Library Federation (1998), A working definition of digital library:
Availably at
http://www.diglib.org/about/dldefinition.htm
6. Greenstone
training workshop material.(2002) Greenstone Digital Library Project and
NCSI,
IISC.
(2003). http://www.greenstone.org
7.
Ian, H. Witten. (2003). Examples of Practical digital libraries: collections
built Internationally
using
Greenstone. D-Lib Magazine. http://dlib.org/dlib/march03;witten/03witten.html.
8.
Ibid., Ian, H. Witten & David, Brainbridge. (2003). How to build a
digital library. London.
Morgan
Kaufman publishers.
9.
Lesk, M (1997) Practical Digital Libraries: Books, bytes and bucks. Morgan
Kaufman
San
Francisco
No comments:
Post a Comment