Digital Libraries

1. INTRODUCTION

The idea of easy, finger-tip access to information-what we conceptualize as digital libraries today-began with Vannenar Bush's Memex machine (Bush, 1945) and has continued to evolve with each advance in information technology. With the arrival of computers, the concept centered on large bibliographic databases, the now familiar online retrieval and public access systems that are part of any contemporary library. When computers were connected into large networks forming the Internet, the concept evolved again, and research turned to creating libraries of digital information that could be accessed by anyone from anywhere in the world. Phrases like "virtual library," "electronic library," "library without walls" and, most recently, "digital library”.

2. Definition

Digital library is a library in which collections are stored in digital formats and accessible by computers. The digital contents may be stored locally or accessed remotely via computer networks. A digital library is a type of information retrieval system.

Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.

The Digital Library is:

Organized collection of multimedia and other types of resources.
Resources are available in computer process able form.
The function of acquisition, storage, preservation, retrieval is carried out through the use of digital technology.
Access to the entire collection is globally available directly or indirectly across a network.
Support users in dealing with information objects
Helps in the organization and presentation of the above objects via electronic/digital means etc.

3. Abstract views of Digital Library

Another factor adding to the confusion is that digital libraries are defined differently for different communities depending their researchers. For example:

From an information retrieval point of view, it is a large database
For people who work on hypertext technology, it is one particular application of hypertext methods
For those working in wide-area information delivery, it is an application of the Web

4. Misconsumptions

For computer scientists and software developers, collections of computer algorithms or software programs are digital libraries.
For database vendors or commercial document suppliers, their databases and electronic document delivery services and digital libraries.
For large corporations, a digital library is the document management systems that control their business documents in electronic form.
For a publisher, it may be an online version of a catalogue.
And for at least one very large software company, a digital library is the collection of whatever it can buy the rights to, and then charge people for using.

5. Characteristics of Digital Library

Digital libraries are the digital face of traditional libraries that include both digital collections and traditional, fixed media collections. So they encompass both electronic and paper materials.
Digital libraries will also include digital materials that exist outside the physical and administrative bounds of any one digital library
Digital libraries will include all the processes and services that are the backbone and nervous system of libraries. However, such traditional processes, though forming the basis digital library work, will have to be revised and enhanced to accommodate the differences between new digital media and traditional fixed media.
Digital libraries ideally provide a coherent view of all of the information contained within a library, no matter its form or format
Digital libraries will serve particular communities or constituencies, as traditional libraries do now, though those communities may be widely dispersed throughout the network.
Digital libraries will require both the skills of librarians and well as those of computer scientists to be viable.

6. Need for Digital Library

Easy to understand: The visual or graphical information system of digital libraries is more popular as compared to text based information system.
Shifting of the environment: The new generation user becomes only happy when they will be able to read from the computer screen.
Multiple function of same information: In case of digital libraries by using hypertext it is possible to structure and organized the same digital information in a variety of ways, which serve multiple functions.
Information explosion: Digital library is expected to be able to handle the problem of information explosion somehow. It will be able to handle and manage large amount of digital content by simply providing link, without actually procuring the document.
Information retrieval: By using digital library one will be able to retrieved information specifically for e.g. A particular image, photo, a definition etc.
Distance learning: Learning from home, office or other places, which are convenient to user.
To procure online publication: More and more information are going to published over internet, digital library is needed to procure the online publication and to provide link to important sources of information

7. Requirements for Digital Library

The Internet and World Wide Web provide the impetus and technological environment for the development and operation of a digital library. The Internet provides the TCP/IP and or its associated protocol for accessing the information and web provide tools and technique for publishing the information over Internet.

Some of the requirement for a digital libraries are:

Audio visual: Color T.V., V.C.R., D.V.D., Sound box, Telephone etc.
Computer: Server, P.C. with multimedia, U.PS. Etc
Network: LAN, MAN, WAN, Internet etc.
Printer: Laser printer, Dot matrix, Barcode printer, Digital graphic printer etc
Scanner: H.P. Scan jet, flatbed, Sheet feeder, Drum scanner, Slide scanner, Microfilming scanner, Digital camera, Barcode scanner etc
Storage devices: Optical storage device, CD-ROM, Jukebox etc.
Software: Any suitable software, which is interconnected and suitable for LAN and WAN connection.

Resources of a digital library

The resources of a digital library are those, which the computer can store, organized, transmit and display without any intervening conversion process. It includes both print and electronic or digital material. The digital material may be of multimedia types or any other i.e. only digital audio, video, full text information, photograph, drawing, digitized sound, e-book, electronic tax, map, image, 3D representation etc

On line resources:

Local database of traditional books in machine-readable form.
Electronic tax, e-book, map, image, sound, video, and multimedia etc.
E-journal
LAN, MAN, WAN for web browsing, e- mail etc.
Well trained manpower for online help

Off line resources:

C.D-ROM etc.
Audio visual aid etc

8. Principles of Digital Libraries

Building a digital library is expensive and resource-intensive. Before embarking on such a venture, it is important to consider some basic principles underlying the design, implementation, and maintenance of any digital library. These principles apply not only to conversion projects in which analog objects are converted to digital form, but to digital libraries in which the objects have always been in digital form (“born digitally”) and to “mixed” digital libraries in which the objects may be of both types. The principles are, in some sense, self evident, yet it is easy to lose sight of them when under pressure to build a system, despite limited resources and time.

Expect change
Know your content
Involve the right people
Design usable systems
Ensure open access
Be(a)ware of data rights
Automate whenever possible
Adopt and adhere to standards
Ensure quality
Be concerned about persistence

9. Terminologies used in Digital Libraries

It is helpful to consider three broad classes of library elements.

Data
Metadata
Processes

Data:

Information stored in digital library can be divided into data and metadata data is a general term to describe information that is encoded in digital form or in other words it is a library material.

Metadata:

These are information about the library and its materials or it is data about other data.

Distinction between data and metadata often depends upon the context. Catalog records or abstracts are usually considered to be metadata because they describe other data, but in an online catalog or a database of abstracts they are data.

Metadata is a structured information that describes, explains, locates or otherwise make it easier to retrieve, use or manage an information resource. Metadata is often called information about information.

What Does Metadata Do?

Metadata is used to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually. Using metadata we will be able to search or identify the book uniquely. Descriptive metadata can be used to automize workflows. For example, if a tool knows content and structure of data it can convert them automatically and give them to another tool as input. By that, users could save many copy-and-paste actions that are necessary when analyzing data with different tools.

There are three main types of metadata:

· Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.

· Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.

· Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.

10. Components of the computer system

The digital library framework permits many different computer systems to coexist. The key components are shown in the figure below. They run on a variety of computer systems connected by a computer network, such as the Internet.

Major system components

To demonstrate this framework, we implemented a pilot system. A more comprehensive prototype will be completed early in 1997.

User interfaces

Both the pilot and the prototype have two user interfaces: one for the users of the library, the other for the librarians and system administrators who manage the collections. Each user interface is in two parts. A standard Internet browser is used for the actual interactions with the user. This can be Netscape Navigator, Microsoft's Internet Explorer, or the Grail browser developed by our colleagues at CNRI. The browser connects to client services, which provide intermediary functions between the browser and the other parts of the system. The client services allow the user to decide where to search and what to retrieve; they interpret information structured as digital objects; they negotiate terms and conditions, manage relationships between digital objects, remember the state of the interaction, and convert among the protocols used by the various parts of the system.

Repository

Repositories store and manage digital objects and other information. A large digital library may have many repositories of various types, including modern repositories, legacy databases, and Web servers. Section 4 of this report describes the pilot repository that we have implemented and enhancements planned for the prototype. The interface to this repository is called the repository access protocol (RAP). Features of RAP are explicit recognition of rights and permissions that need to be satisfied before a client can access a digital object, support for a very general range of disseminations of digital objects, and an open architecture with well defined interfaces.

Handle system

Handles are general purpose identifiers that can be used to identify Internet resources, such as digital objects, over long periods of time and to manage materials stored in any repository or database. CNRI's handle system is a computer system that provides a distributed directory service for identifiers (handles) for Internet resources. When used with the repository, the handle system receives as input a handle for a digital object and returns the identifier of the repository where the object is stored.

Search system

The design of the digital library system assumes that there will be many indexes and catalogs that can be searched to discover information before retrieving it from a repository. These indexes may be independently managed and support a wide range of protocols. The pilot system is independent of any search system; the prototype is being linked to CIIR's In Query system, which is already in use at the Library of Congress.

11. Technical Challenges

Storage.

A digital library's storage system must be capable of storing a large amount of data in a variety of formats and accessing this data as quickly as possible. Text-only documents—stored in formats such as ASCII, LaTex, HTML, SGML, and PostScript—are by far the easiest to store. Digital audio and video are more difficult to store because they require significantly more storage space and their delivery is time-dependent.

A typical digital library uses a variety of database-management systems

User interface.

The user interface, perhaps the most important digital library component, must incorporate a wide variety of techniques to afford rich interaction between users and the information they seek. For computer workstations, graphical user interfaces such as X-Windows, Microsoft Windows, and Macintosh System 7 interfaces are the status quo.

Classification and indexing.

Classification and indexing schemes are used to collect related content into groups that are intuitive to a user. Classifying and indexing objects is filled with pitfalls, however, because individual perceptions vary. Another complicating factor in indexing and classifying is the tremendous amount of potential content that remains to be indexed. It is clear that manual methods for classification are insufficient for all but the most trivial digital library.

Information retrieval.

The concepts underlying information retrieval were conceived long before computers and information systems were employed to store library materials. In the digital library domain, there are a variety of information-retrieval techniques, including metadata searching, full-text document searching, and content searching for other data types.

Content delivery.

Once an item of interest has been located in a digital library, it may be delivered in several ways. If the content is small, such as 100 pages of text or a 50-Kbyte image file, it may be delivered through the same channel used for information retrieval and querying. Content such as movies and software, however, demands much higher bandwidth. In these cases, delivery is over dedicated leased lines (for example, cable TV or videoconferencing systems) or satellite-based systems such as the Hughes Network Systems project

Presentation.

Users of a traditional library usually want to read a book or watch a videotape; other uses are rare. With digital technology, it now becomes possible to listen to a book being read, watch a video of a musical performance alongside the original score, or hold a mechanical hand as it forms American Sign Language. Other possible uses are highly personal—individuals may dream up many distinct variations. A digital library's presentation systems must be flexible and highly customizable. They must also be aware of the output hardware's capabilities and limitations, automatically adjusting to deliver the best possible presentation quality at all times.

Administrative.

Traditional libraries store a final copy of a book or other documents. Digital libraries store several versions of a document in a way that makes multiple revisions by multiple authors possible. In addition, the content for a digital library may have multiple owners in terms of the sources of the content and annotations made to the content of the library. An administrative system ensures that materials intended for public viewing can indeed be viewed by anyone while private collections and personal annotations may only be viewed by a select group or single individual

12. CONTENT-BASED IMAGE RETRIEVAL

Content-based image retrieval, a technique which uses visual contents to search images from large scale image databases according to users' interests, Early techniques were not generally based on visual features but on the textual annotation of images. In other words, images were first annotated with text and then searched using a text-based approach from traditional database management systems.

Comprehensive surveys of early text-based image retrieval methods can be found in. Text-based image retrieval uses traditional database techniques to manage images. Through text descriptions, images can be organized by topical or semantic hierarchies to facilitate easy navigation and browsing based on standard Boolean queries. However, since automatically generating descriptive texts for a wide spectrum of images is not feasible, most text-based image retrieval systems require manual annotation of images.

Content-based image retrieval, uses the visual contents of an image such as color, shape, texture, and spatial layout to represent and index the image. In typical content-based image retrieval systems (Figure 1-1), the visual contents of the images in the database are extracted and described by multi-dimensional feature vectors. The feature vectors of the images in the database form a feature database. To retrieve images, users provide the retrieval system with example images or sketched figures. The system then changes these examples into its internal representation of feature vectors. The similarities /distances between the feature vectors of the query example or sketch and those of the images in the database are then calculated and retrieval is performed with the aid of an indexing scheme. The indexing scheme provides an efficient way to search for the image database. Recent retrieval systems have incorporated users' relevance feedback to modify the retrieval process in order to generate perceptually and semantically more meaningful retrieval results. In this chapter, we introduce these fundamental techniques for content-based image retrieval.

Figure . Diagram for content-based image retrieval system

Some widely used techniques for extracting color, texture, shape and spatial relationship from images.

Color

Its three-dimensional values make its discrimination potentiality superior to the single dimensional gray values of images. Before selecting an appropriate color description, color space must be determined first

Color Space

Each pixel of the image can be represented as a point in a 3D color space. Commonly used color space for image retrieval include RGB, and opponent color space.

However, one of the desirable characteristics of an appropriate color space for image retrieval is its uniformity Uniformity means that two color pairs that are equal in similarity distance in a color space are perceived as equal by viewers. In other words, the measured proximity among the colors must be directly related to the psychological similarity among them.

RGB space is a widely used color space for image display. It is composed of three color components red, green, and blue. These components are called "additive primaries" since a color in RGB space is produced by adding them together.

Color Moments

Color moments have been successfully used in many retrieval systems, especially when the image contains just the object. The first order (mean), the second (variance) and the third order (skewness) color moments have been proved to be efficient and effective in representing color distributions of images. Mathematically, the first three moments are defined as:

where fij is the value of the i-th color component of the image pixel j, and N is the number of pixels in the image.

Color Histogram

The color histogram serves as an effective representation of the color content of an image if the color pattern is unique compared with the rest of the data set. The color histogram is easy to compute and effective in characterizing both the global and local distribution of colors in an image.

Texture:

Texture measures look for visual patterns in image and how they are spatially defined. Textures are represented by texels which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture located.

Shape:

Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying segmentation or edge detection to an image. In some cases accurate shape detection will require human intervention because methods like segmentation are very difficult to completely automate.

13. Advantages and disadvantages of the Digital Library

No physical boundary: The user of a digital library need not to go to the library physically, people from all over the world could gain access to the same information, as long as an Internet connection is available.
Round the clock availability: Digital libraries can be accessed at any time, 24 hours a day and 365 days of the year
Multiple accesses: The same resources can be used at the same time by a number of users.
Structured approach: Digital library provides access to much richer content in a more structured manner i.e. we can easily move from the catalog to the particular book then to a particular chapter and so on.
Information retrieval: The user is able to use any search term bellowing to the word or phrase of the entire collection. Digital library will provide very user friendly interfaces, giving click able access to its resources.
Preservation and conservation: An exact copy of the original can be made any number of times without any degradation in quality.
Space: Whereas traditional libraries are limited by storage space, digital libraries have the potential to store much more information, simply because digital information requires very little physical space to contain them. When the library had no space for extension digitization is the only solution.
Networking: A particular digital library can provide the link to any other resources of other digital library very easily thus a seamlessly integrated resource sharing can be achieved.
Cost - The cost of maintaining a digital library is much lower than that of a traditional library. A traditional library must spend large sums of money paying for staff, book maintains, rent, and additional books. Digital libraries do away with these fees.

Disadvantages of the Digital Library

Copyright: - Digitization violates the copy right law as the thought content of one author can be freely transfer by other without his acknowledgement. So One difficulty to overcome for digital libraries is the way to distribute information. How does a digital library distribute information at will while protecting the copyright of the author?
Speed of access: - As more and more computer are connected to the Internet its speed of access reasonably decreasing. If new technology will not evolve to solve the problem then in near future Internet will be full of error messages.
Initial cost is high: - The infrastructure cost of digital library i.e. the cost of hardware, software; leasing communication circuit is generally very high.
Band width: - Digital library will need high band for transfer of multimedia resources but the band width is decreasing day by day due to its over utilization.
Efficiency: - With the much larger volume of digital information, finding the right material for a specific task becomes increasingly difficult.
Environment: - Digital libraries cannot reproduce the environment of a traditional library. Many people also find reading printed material to be easier than reading material on a computer screen.
Preservation: - Due to technological developments, a digital library can rapidly become out-of-date and its data may become inaccessible

14. CONCLUSION

Digital libraries are not going to replace the physical existence of document completely but no doubt to meet the present demand, to satisfy the non local user digitization must be introduced so that at least libraries becomes of hybrid nature. The initial cost of digitization is high but experiment shows that ones digitization is introduced then the cost to manage this collection will be cheaper than that of any traditional library.

REFERENCES

Funamental of Multimedia by Ze – Mian Li and Mark S. Drew Pearson, Edi. 2004
www.ieee.com
www.digitallibrarydetail.htm
www.google.com

Search This Blog

LM World