Thursday, 23 March 2017



Abstract
The paper examines greenstone digital library software, its features, installations requirement, potentials and examples of live digital libraries built with the software around the world.  The paper finally provides collections building method with some examples of different types of collections and concludes with export of collection to Compact Disk Read Only Memory (CD/ROM).


INTRODUCTION TO GREENSTONE DIGITAL LIBRARY SOFTWARE

What are digital libraries? Some definitions
“Digital Libraries are organized collections of digital information. They combine the structuring and gathering of information, which libraries and archives have always done, with digital representation that computers have made possible”. Lesk (1997)

Digital Libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by defined community or set of communities”. Digital Library Federation (1998)

A working definition
A Digital Library is an organized collection of digital materials with associated services, accessible over networks.

Note: Thus, a stream of data coming from the satellite to your computer is not a digital
Library such as google data.

What is Greenstone Digital Library Software (GSDL)?
A comprehensive open source software suit to build, maintain and distribute digital library collections. Available at: www.greenstone.org.

It was developed by the New Zealand Digital Library (NZDL) Project at the University
of Waikato. Being distributed and promoted by NZDL, in cooperation with UNESCO and Human Info NGO, Belgium.

Aims of GSDL
The aim of the software is to empower users, particularly in Universities, Libraries, and other public service institutions, to build their own digital libraries. Digital Libraries are radically reforming how information is disseminated and acquired in UNESCO’s partner communities and institutions in the field of education, science and culture around the world, and particularly in developing countries.  From: www.greenstone.org

Features of GSDL
Ø  It can be accessed and distributed through the Web and CD-ROMs
Ø  Multi-platform availability in UNIX and Windows.
Ø  Graphical User Interface (GUI) – Greenstone Librarian Interface (GLI) and
command based collection building.
Ø  Powerful indexing, i.e. full-text, documents, sectional indexing.
Ø  Metadata based including field-based and automatic indexing.
Ø  Support for Dublin Core and other metadata.
Ø  Powerful search and browsing including full text and field search, Boolean and
ranking retrieval.
Ø  Support for several document formats such as text, html, word, pdf, email etc. that
are enabled by plug-ins.
Ø  Homepage customization.
Ø  Configurability, multilingual, metadata and classifiers.
Ø  Advanced compression for text and indexes.
Ø  Includes Organizer for the Collections.
Ø  Respect for Copyrights.
Ø  Support:: 

User List: greenstone-users@list.scms.waikato.ac.nz.
It is the most popular mailing list which is archived as a Greenstone collection at
www.nzdl.org.
Developers List: Greenstone-devel@list.scms.waikato.ac.nz.

Minimum Requirements for GSDL Installation

Hardware
- Disk space 50 MB for binary installation.
- 155 MB for compiling Greenstone from source code
- 200 MB for optional Greenstone demo collections.
- 4 MB for online documentation.
- 24 MB for Greenstone’s CD exporting functions.

Software
- Java Runtime Environment (JRE)
- Web Server (Apache recommended).
- Practical Extraction and Report Language (PERL) for collection building.
- C++ compiler for source codes.
- Web browser (Netscape and Internet Explorer recommended
Examples of Greenstone Digital Libraries in Action
With Internet connection, you can access the following examples of Greenstone Library Web Sites by invoking (Control + Click) on the links below.


MOST Digital Library (UNESCO)
The MOST Digital Library contains results from research carried out during the first ten years of the MOST Programme. The themes covered include Drugs, Globalization and Governance, International Migration, Multicultural Societies, Poverty Eradication, Social transformations, Sustainability, Urban Development and HIV/AIDS. The MOST digital library offers a brief abstract of each document (in English, French and Spanish) that identifies the key issues and arguments in a few lines. The database contains the documents in their original language (English, French or Spanish) and in translation if such a version exists. The user interface is currently in English only, but will shortly also be available in French and Spanish.

New Zealand Digital Library Project
A demonstration site set up by the developers of Greenstone, the New Zealand Digital Library Project. This site contains many collections, ranging from humanitarian information to computer science technical reports to demonstration collections of Chinese and Arabic documents.

The United Nations Digital Library - Islamabad
The United Nations Digital Library Islamabad, is an Open-Access, online searchable repository containing Full-text of documents, reports, publications and other public information items produced by the country offices of United Nations Organizations,Programmes and Funds in Pakistan. The Digital Library of United Nations in Pakistan strives to stop the loss of digital information of the UN in Pakistan and facilitates the retention and long-term preservation, in a usable form. The collection comprises the general documents, reports, publications, newsletters, press releases and other public information items. This repository is a centralized information resource of the United Nations information on or about Pakistan.

University of Namibia
This library includes a collection of past examination papers of the University of Namibia, a register of Namibian theses and dissertations, and a collection containing publications of University of Namibia staff members between 1992 and 2002.

Potentials of Greenstone Software in Nigeria
Ø  Digitization of books, Journals, Newsletters, Pamphlets, Monographs, Newspapers
Ø  Digitization of Reports: Annual reports, Management and Committee’s reports.
Ø  Digitization of important documents such as Laws of the Federation, Decrees, Nigeria Constitution, Conference Proceedings, etc
Ø  Development of local Music and Video Libraries
Ø  Theses and Dissertation of Universities
Ø  Images of our tourist attractions, productions of Indigenous Knowledge and lot more

COLLECTION BUILDING AND METADATA

                                                
Collection Building
Collection Building in GSDL is a way of gathering information/materials that will be used for creating your digital library database.

Metadata
Wikipedia define Metadata as data providing information about one or more aspects of the data.  National Information Standards Organization (NISO) gave a comprehensive definition of metadata as structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.

BUILDING A SMALL COLLECTION OF HTML FILES

You will need some HTML files.
Running the Librarian Interface (GLI)
·        Start the Greenstone Librarian Interface:
Start → All Programs → Greenstone 2.85 (note this depend on the GSDL Version you are running) → Librarian Interface (GLI)
After a short pause a startup screen appears, and then after a slightly longer pause the main Greenstone Librarian Interface appears. (A command prompt is also opened in the background.)

Starting a New Collection
·        Start a new collection within the Librarian Interface:
File → New...















·        You will create a collection based on a few HTML web pages.
A window pops up. Fill it out with appropriate values—for example,
Collection title:          Small HTML Collection
Description of content: A small collection of HTML pages
Leave the setting for Base this collection on:
at its default: -- New Collection --, and click <OK>.












·        Next, you will need to gather together the files that will constitute the collection. A suitable set has been prepared ahead of time in workshop_folders → simple_html. Using the left-hand side of the Librarian Interface's Gather panel, interactively navigate to the workshop_folders folder.

Adding documents to the Collection 
·        Now drag the simple_html folder from the left-hand side and drop it on the right. The progress bar at the bottom shows some activity. Gradually, duplicates of all the files will appear in the collection panel.
You can inspect the files that have been copied by double-clicking on the folder in the right-hand side.
·        Since this is our first collection, we won't complicate matters by manually assigning metadata or altering the collection's design. Instead we rely on default behaviour. So pass directly to the Create panel by clicking its tab.

Building the Collection 
·        To start building the collection, click on Create then click the <Build Collection> button.
·        Once the collection has been successfully built, a window pops up “Collection Creation Result” windows pop-up will appear. Click <OK> to confirm the creation.
·        Click the <Preview Collection> button to look at the end result. This loads the relevant page into your web browser (starting it up if necessary).

Viewing extracted Metadata 
·        Back in the Librarian Interface, click the Enrich tab to view the metadata associated with the documents in the collection.
·        Presently there is no manually assigned metadata, but the act of building the collection has extracted metadata from the documents. Double click the simple_html folder to expand its content. Then single-click aragon.html to display all its metadata in the right-hand side of the panel. The initial fields, starting "dc.", are empty. These are Dublin Core metadata fields for manually entered data.
·        Use the scroll bar on the extreme right to view the bottom part of the list. There you will see fields starting "ex." that express the extracted metadata: for example ex.Title, based on the text within the HTML Title tags, and ex.Language, the document's language (represented using the ISO standard 2-letter mnemonic) which Greenstone determines by analyzing the document's text.
·        Close the collection by clicking File → Close. This automatically saves the collection to disk.

Setting-up Shortcut in the Librarian Interface 
·        To set up a shortcut to the source files, in the Gather panel navigate to the folder in your local file space that contains the files you want to use—in our case, the workshop_folders folder. Select this folder and then right-click it, and choose Create Shortcut from the menu. In the Name field, enter the name you want the shortcut to have, or accept the default workshop_folders. Click <OK>. Close all the folders in the file tree in the left-hand pane, and you will see the shortcut to your source files.

 

A SIMPLE IMAGE COLLECTION

·        In the Librarian Interface (GLI), starts a new collection (File → New...) called backdrop. Fill out the fields with appropriate information. For Base this collection on:, select the item Image-e (imagee) from the pull-down menu.
When you base a collection on an existing one, it inherits all the settings of the old one. You won't be asked to choose a metadata set because the new collection inherits the ones (if any) used by the seed collection.
·        Copy the images provided in workshop_folders → images into your newly-formed collection.
·        Change to the Create panel and build the collection.
·        Preview the result.
·        Click on in the navigation bar to view a list of the photos ordered by filename and presented as a thumbnail accompanied by some basic data about the image. The structure of this collection is the same as Image-e (imagee), but the content is different.
·        Back in the Librarian Interface, change to the Enrich panel and view the extracted metadata for Bear.jpg.

Adding Title and Description Metadata 
·        We work with just the first three files (Bear.jpg, Cat.jpg and Cheetah.jpg) to get a flavour of what is possible. First, set each file's dc.Title field to be the same as its filename but without the filename extension:

Click on Bear.jpg so its metadata fields are available, then click on its dc.Title field on the righthand side. Type in Bear.

Repeat the process for Cat.jpg and Cheetah.jpg.

·        Add a description for each image as dc.Description metadata.
What description should you enter? To remind yourself of a file's content, the Librarian Interface lets you open files by double-clicking them. It launches the appropriate application based on the filename extension, Word for .doc files, Acrobat for .pdf files and so on.

Double-click Bear.jpg: on Windows, the image will normally be displayed by Microsoft's Photo Editor (although this depends on how your computer has been set up).

Back in the Enrich pane, make sure that Bear.jpg is selected in the collection tree on the left hand side. Enter the text Bear in the Rocky Mountains as the value for the dc.Description field.

Repeat this process for Cat.jpg and Cheetah.jpg, adding a suitable description for each.

·        Go to the Create panel and click <Build Collection>. Once it has finished building, preview the collection. You will not notice anything new. That's because we haven't changed the design of the collection to take advantage of the new metadata.

Change Format Features to display new metadata
·        Now we customize the collection's appearance. Go to the Format panel and select Format Features from the left-hand list. Leave the feature selection controls at their default values, so that All Features is selected for Choose Feature, and VList is selected as the Affected Component. In the HTML Format String, edit the text as follows:
Change _ImageName_: to Title:
Change [Image] to [dc.Title]
After [dc.Title]<br> add Description: [dc.Description]<br>

Metadata names are case-sensitive in Greenstone: it is important that you capitalize "Title" and "Description" (and don't capitalize "dc").

·        The new format statement is displayed in the list of assigned format statements. The first substitution alters the fragment of text that appears to the right of the thumbnail image, the second alters the item of metadata that follows it. The addition displays the description after the Title.

·        Preview the collection by clicking the <Preview Collection> button. When you click on Browse in the navigation bar the presentation has changed to "Title: Bear" and so on. Each image's description should appear beside the thumbnail, following the title.

After the first three items, the Title and Description become blank because we have only assigned Dublin Core metadata to these first three. To get a full listing, enter all the metadata.

Changes in the Format panel take place immediately and you can see the result straightaway by clicking the Preview Collection. If you modify anything in the Gather, Enrich or Design panels, you will need to rebuild the collection.

Changing the Size of Image Thumbnails 
·        Lets change the size of the thumbnail image and make it smaller. Thumbnail images are created by the ImagePlugin plug-in, so we need to access its configuration settings. To do this, switch to the Design panel and select Document Plugins from the list on the left. Double-click ImagePlugin to pop up a window that shows its settings. (Alternatively, select ImagePlugin with a single click and then click <Configure Plugin...> further down the screen). Currently all options are off, so standard defaults are used. Select thumbnailsize, set it to 50, and click <OK>.
·        Build and preview the collection.
·        Once you have seen the result of the change, return to the Design panel, select the configuration options for ImagePlugin, and switch the thumbnailsize option off so that the thumbnail reverts to its normal size when the collection is re-built.

Adding Browsing Classifier based on Description Metadata 
·        Now we'll add a new browsing option based on the descriptions. In the Design panel, select Browsing Classifiers from the left-hand list. Set the menu item for Select classifier to add: to AZList; then click <Add Classifier...>.
·        A window pops up to control the classifier's options. Set the metadata option to dc.Description and click <OK>.
·        Build the collection, and preview it. Choose the new descriptions link that appears in the navigation bar.

Only three items are shown, because only items with the relevant metadata (dc.Description in this case) appear in the list. The original browse list includes all photos in the collection because it is based on ex. Image, extracted metadata that reflects an image's filename, which is set for all images in the collection.

Creating Searchable Index based on Description Metadata 
·        Now we'll add an index so that the collection can be searched by descriptions. Switch to the Design panel and select Search Indexes from the left-hand list. Click the <New Index> button.  Select dc.Description from the list of metadata to include in the index, leave Indexing level: at its default, "document", and click <Add Index>.
·        Switch to the Create panel, build the collection, then preview it. There is now a Search button in the navigation bar. As an example, search for the term "bear" in the document:dc.Description index (which is the only index at this point).
·        To change the text that is displayed for the index (document:dc.Description), go to the Format panel back in the Librarian Interface. Select Search from the left-hand list. This panel allows you to change the text that is displayed on the search form. Change the Display text for the document:dc.Description index to "descriptions" (or other suitable text). Go back to the browser and reload the search page. Your new text will appear in the search form.

A COLLECTION OF WORD AND PDF FILES

You will need some source files like those in the workshop_folders → Word_and_PDF folder.
·        Start a new collection called reports (FileNew...), base it on -- New Collection --, and choose Dublin Core as the metadata set.
·        Copy all the files from workshop_folders → Word_and_PDF → Documents into the collection. You can select multiple files by clicking on the first one and shift-clicking on the last one, and drag them all across together. (This is the normal technique of multiple selections.)
·        Switch to the Create panel, and build and preview the collection.

Viewing the Extracted Metadata 
·        Again, this collection contains no manually assigned metadata. All the information that appears—title and filename—is extracted automatically from the documents themselves. Because of this the quality of some of the title metadata is suspect.
·        Back in the Librarian Interface, click the Enrich tab to view the automatically extracted metadata. You will need to scroll down to see the extracted metadata, which begins with "ex.".
·        Check whether the ex.Title metadata is correct for some of the documents by opening them. You can open a document from the Librarian Interface by double clicking on it.
·        The extracted Title metadata for some documents is incorrect. For example, the Titles for pdf01.pdf and word03.doc (the same document in different formats) have missed out the second line. The Title for pdf03.pdf has the wrong text altogether.

Manually Adding Metadata
·        In the Enrich panel, manually add Dublin Core dc.Title metadata to those documents which have incorrect ex.Title metadata. Select word03.doc and double-click to open it. Copy the title of this document ("Greenstone: A comprehensive open-source digital library software system") and return to the Librarian Interface. Scroll up or down in the metadata table until you can see dc.Title. Click in the value box and paste in the metadata.
·        Now add dc.Creator information for the same document. You can add more than one value for the same field: when you press Enter in a metadata value field, a new empty field of the same type will be generated. Add each author separately as dc.Creator metadata.
·        Close the document (in Microsoft Word) when you have finished copying metadata from it.  External programs opened when viewing documents must be closed before building the collection, otherwise errors can occur.
·        Next add dc.Title and dc.Creator metadata for a few of the other documents.
·        You will notice as you add more values, they appear in the Existing values for ... box below the metadata table. If you are adding the same metadata value to more than one document, you can select it from this list. For example, pdf01.pdf and word03.doc share the same Title; and many documents have common authors.

If you build and preview your collection at this point, you will see that the Titles list now shows your new Titles. However, the dc.Creator metadata is not displayed. You need to alter the collection design to use this metadata.



Document Plugins
·        In the Librarian Interface, look at the Document Plugins section of the Design panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the ZIPPlugin, TextPlugin, HTMLPlugin, EmailPlugin, ImagePlugin, PowerPointPlugin, ExcelPlugin, ISISPlug and NULPlugin plugins. To delete a plugin, select it and click <Remove Plugin>. GreenstoneXMLPlugin is required for any type of source collection and should not be removed.

Search Indexes
·        The next step in the Design panel is Search Indexes. These specify what parts of the collection are searchable (e.g. searching by title and author). Delete the ex.Source index, which is not particularly useful, by selecting it and clicking <Remove Index>.
·        Modify the ex.Title index to include dc.Title by selecting the index in the Assigned Indexes box and clicking <Edit Index>. Select dc.Title from the list of metadata, and click <Replace Index>. Searching this index will search both dc.Title and ex.Title metadata. If you want to restrict searching to just the manually added dc.Title metadata, edit the index again and deselect ex.Title from the list of metadata.
·        You can add indexes based on any metadata. Add a new index based on dc.Creator by clicking <New Index>. Select dc.Creator in the list of metadata, and click <Add Index>.

Browsing Classifiers
·        The Browsing Classifiers section adds "classifiers," which provide the collection with browsing functions. Go to this section and observe that Greenstone has provided two classifiers, AZLists based on ex.Title and ex.Source metadata. These correspond to the Titles and Filenames buttons on the collection's access bar.

Remove the ex.Source classifier by selecting it and clicking <Remove Classifier>.

·        Modify the ex.Title classifier to use dc.Title instead. Select the classifier and click <Configure Classifier...>. In the metadata box, select dc.Title instead of ex.Title. Click <OK>.

·        Now add an AZCompactList classifier for dc.Creator. Select AZCompactList from the Select classifier to add drop-down list and click <Add Classifier...>. A popup window Configuring Arguments appears. Select dc.Creator from the metadata drop-down list and click <OK>.

AZCompactList is like AZList, except that values that appear multiple times in the hierarchy are automatically grouped together and a new node, shown as a bookshelf icon, is formed.

·        Switch to the Create panel, and build and preview the collection.
·        Check that all the facilities work properly. There should be three full-text indexes, called text, dc.Title (or dc.Title,Title if you didn't deselect ex.Title in the search indexes step above), and dc.Creator. The Titles list should display all the documents to which you have assigned dc.Title metadata (and only those documents). The Creators list should show one bookshelf for each author you have assigned as dc.Creator, and clicking on that bookshelf should take you to all the documents they authored.

Renaming the Search Indexes
·        The default display text for the indexes in the drop-down list on the search page contains the content of the index. Now we will change this display text to make it nicer. Go to the Format panel by clicking its tab. This panel is split into several sections, each controlling some aspect of collection presentation.
·        Select Search in the left hand list. This section allows you to modify what text is displayed for the drop-down lists in the search form (indexes, subcollections, levels etc). Set the Display text for the dc.Title (or dc.Title,Title if you didn't deselect ex.Title in the search index) index to be "titles", and that for the dc.Creator index to be "creators". Preview the collection by clicking the Preview Collection. The search form should display the new text.

Classifying on Multiple Metadata
·        The new Titles list shows only those documents which have been assigned dc.Title metadata.  For many documents, extracted Titles may be fine, and it is impractical to add the same metadata again as dc.Title. Fortunately there is a way we can use both metadata types in one classifier: specify a list of metadata names in the classifier.
·        In the Browsing Classifiers section of the Design panel, select the AZList for dc.Title in the Assigned Classifiers box and click <Configure Classifier...>. Note you can achieve the same result by double clicking on the classifier.
·        In the metadata field, type ",ex.Title" after the "dc.Title"—i.e. make it read
dc.Title,ex.Title
·        If you have already done the Enhanced Word document handling exercise, some of the documents will have extracted ex.Creator metadata, and some will have dc.Creator. To use both of these in the Creators classifier, make a similar change to the AZCompactList: make the metadata field read dc.Creator,ex.Creator.

Build the collection again and preview it. Now all of the documents should appear in the Titles list (and extracted Creators should appear in the Creators list).

EXPORTING A COLLECTION TO CD-ROM/DVD
To publish a collection on CD-ROM or DVD, Greenstone's Export to CD-ROM export module must be installed. This is included with CD-ROM distributions, and all distributions 2.70w and later. It must be installed separately for non-CD-ROM versions of Greenstone, version 2.70 and earlier (see Installing Greenstone).
1. Launch the Greenstone Librarian Interface if it is not already running.
2. Choose File à Write CD/DVD image.... In the resulting popup window, select the collection or collections that you wish to export by ticking their check boxes. You can optionally enter a name for the CD-ROM: this is the name that will appear in the menu when the CD-ROM is run. If a name is not entered, the default Greenstone Collections will be used. You can also specify whether the resulting CD-ROM will install files onto the host machine when used or not. Click <Write CD/DVD image> to start the export process.  The necessary files for export are written to:
Greenstone à tmp à exported_xxx
where xxx will be similar to the name you have entered. If you didn't specify a name for the CDROM, then the folder name will be exported_collections.

You need to use your own computer's software to write these on to CD-ROM. On Windows XP this ability is built into the operating system: assuming you have a CD-ROM or DVD writer insert a blank disk into the drive and drag the contents of exported_xxx into the folder that represents the disk.














REFERENCES
1. Allison, Zhange. (2003) Customizing the Greenstone user interface: An illustrated guide to
Customizing the Greenstone user Interface. Washington, D.C. Research Library consortium
2. Building digital Collections: Technical Information and Background Paper.(2000) National
Library Program (NDLP) at the Library of Congress.
3. Cornell University Library/Research Departments. 2000. Moving theory into practice: digital
Image for libraries and archives. Research Libraries Group. Available at
http://www.library.cornell.edu/preservation/tutorial
4. Digital Library Federation (1998), A working definition of digital library: Availably at
http://www.diglib.org/about/dldefinition.htm
6. Greenstone training workshop material.(2002) Greenstone Digital Library Project and NCSI,
IISC. (2003). http://www.greenstone.org
7. Ian, H. Witten. (2003). Examples of Practical digital libraries: collections built Internationally
using Greenstone. D-Lib Magazine. http://dlib.org/dlib/march03;witten/03witten.html.
8. Ibid., Ian, H. Witten & David, Brainbridge. (2003). How to build a digital library. London.
Morgan Kaufman publishers.
9. Lesk, M (1997) Practical Digital Libraries: Books, bytes and bucks. Morgan Kaufman
San Francisco


No comments:

Post a Comment