Digital Libraries
1. INTRODUCTION
The
idea of easy, finger-tip access to information-what we conceptualize as digital
libraries today-began with Vannenar Bush's Memex machine (Bush, 1945) and has
continued to evolve with each advance in information technology. With the arrival
of computers, the concept centered on large bibliographic databases, the now
familiar online retrieval and public access systems that are part of any
contemporary library. When computers were connected into large networks forming
the Internet, the concept evolved again, and research turned to creating
libraries of digital information that could be accessed by anyone from anywhere
in the world. Phrases like "virtual library," "electronic
library," "library without walls" and, most recently,
"digital library”.
2. Definition
Digital library is a library in which
collections are stored in digital formats and accessible by computers. The
digital contents may be stored locally or accessed remotely via computer
networks. A digital library is a type of information retrieval system.
Digital libraries are organizations
that provide the resources, including the specialized staff, to select,
structure, offer intellectual access to, interpret, distribute, preserve the
integrity of, and ensure the persistence over time of collections of digital
works so that they are readily and economically available for use by a defined
community or set of communities.
The Digital Library is:
- Organized collection of multimedia and other types
of resources.
- Resources are available in computer process able
form.
- The function of acquisition, storage, preservation,
retrieval is carried out through the use of digital technology.
- Access to the entire collection is globally
available directly or indirectly across a network.
- Support users in dealing with information objects
- Helps in the organization and presentation of the
above objects via electronic/digital means etc.
3. Abstract
views of Digital Library
Another
factor adding to the confusion is that digital libraries are defined
differently for different communities depending their researchers. For example:
- From an
information retrieval point of view, it is a large database
- For
people who work on hypertext technology, it is one particular application
of hypertext methods
- For
those working in wide-area information delivery, it is an application of
the Web
4. Misconsumptions
- For
computer scientists and software developers, collections of computer
algorithms or software programs are digital libraries.
- For
database vendors or commercial document suppliers, their databases and
electronic document delivery services and digital libraries.
- For
large corporations, a digital library is the document management systems
that control their business documents in electronic form.
- For a
publisher, it may be an online version of a catalogue.
- And for
at least one very large software company, a digital library is the
collection of whatever it can buy the rights to, and then charge people
for using.
5. Characteristics
of Digital Library
- Digital
libraries are the digital face of traditional libraries that include both
digital collections and traditional, fixed media collections. So they
encompass both electronic and paper materials.
- Digital
libraries will also include digital materials that exist outside the
physical and administrative bounds of any one digital library
- Digital
libraries will include all the processes and services that are the
backbone and nervous system of libraries. However, such traditional
processes, though forming the basis digital library work, will have to be
revised and enhanced to accommodate the differences between new digital
media and traditional fixed media.
- Digital
libraries ideally provide a coherent view of all of the information
contained within a library, no matter its form or format
- Digital
libraries will serve particular communities or constituencies, as
traditional libraries do now, though those communities may be widely
dispersed throughout the network.
- Digital
libraries will require both the skills of librarians and well as those of
computer scientists to be viable.
6. Need
for Digital Library
- Easy to understand: The visual or
graphical information system of digital libraries is more popular as
compared to text based information system.
- Shifting of the environment: The new generation user becomes only happy when
they will be able to read from the computer screen.
- Multiple function of same information: In case of digital libraries by using hypertext it
is possible to structure and organized the same digital information in a
variety of ways, which serve multiple functions.
- Information explosion: Digital
library is expected to be able to handle the problem of information
explosion somehow. It will be able to handle and manage large amount of
digital content by simply providing link, without actually procuring the
document.
- Information retrieval: By using
digital library one will be able to retrieved information specifically for
e.g. A particular image, photo, a definition etc.
- Distance learning: Learning from
home, office or other places, which are convenient to user.
- To procure online publication: More and more information are going to published
over internet, digital library is needed to procure the online publication
and to provide link to important sources of information
7. Requirements for Digital Library
The Internet and World Wide Web
provide the impetus and technological environment for the development and
operation of a digital library. The Internet provides the TCP/IP and or its
associated protocol for accessing the information and web provide tools and
technique for publishing the information over Internet.
Some of the
requirement for a digital libraries are:
- Audio visual: Color T.V., V.C.R., D.V.D., Sound box, Telephone
etc.
- Computer: Server, P.C. with multimedia, U.PS. Etc
- Network: LAN, MAN, WAN, Internet etc.
- Printer: Laser printer, Dot matrix, Barcode printer,
Digital graphic printer etc
- Scanner: H.P. Scan jet, flatbed, Sheet feeder, Drum
scanner, Slide scanner, Microfilming scanner, Digital camera, Barcode
scanner etc
- Storage devices: Optical storage device, CD-ROM, Jukebox etc.
- Software: Any suitable software, which is interconnected and
suitable for LAN and WAN connection.
Resources of a digital library
The resources of a digital library are
those, which the computer can store, organized, transmit and display without
any intervening conversion process. It includes both print and electronic or
digital material. The digital material may be of multimedia types or any other i.e.
only digital audio, video, full text information, photograph, drawing,
digitized sound, e-book, electronic tax, map, image, 3D representation etc
On line resources:
- Local database of traditional books in
machine-readable form.
- Electronic tax, e-book, map, image, sound, video,
and multimedia etc.
- E-journal
- LAN, MAN, WAN for web browsing, e- mail etc.
- Well trained manpower for online help
Off line resources:
- C.D-ROM etc.
- Audio visual aid etc
8. Principles of Digital Libraries
Building a digital library is
expensive and resource-intensive. Before embarking on such a venture, it is
important to consider some basic principles underlying the design,
implementation, and maintenance of any digital library. These principles apply
not only to conversion projects in which analog objects are converted to
digital form, but to digital libraries in which the objects have always been in
digital form (“born digitally”) and to “mixed” digital libraries in which the objects
may be of both types. The principles are, in some sense, self evident, yet it
is easy to lose sight of them when under pressure to build a system, despite
limited resources and time.
- Expect change
- Know your content
- Involve the right
people
- Design usable
systems
- Ensure open
access
- Be(a)ware of data
rights
- Automate whenever
possible
- Adopt and adhere
to standards
- Ensure quality
- Be concerned
about persistence
9. Terminologies
used in Digital Libraries
It is helpful to consider three broad
classes of library elements.
- Data
- Metadata
- Processes
Data:
Information stored in digital library
can be divided into data and metadata data is a general term to describe information
that is encoded in digital form or in other words it is a library material.
Metadata:
These are information about the
library and its materials or it is data about other data.
Distinction between data and metadata
often depends upon the context. Catalog records or abstracts are usually
considered to be metadata because they describe other data, but in an online
catalog or a database of abstracts they are data.
Metadata is a structured information
that describes, explains, locates or otherwise make it easier to retrieve, use
or manage an information resource. Metadata is often called information about
information.
What Does Metadata Do?
Metadata is used to
speed up and enrich searching for resources. In general, search queries using
metadata can save users from performing more complex filter operations
manually. Using metadata we will be able to search or identify the book uniquely.
Descriptive metadata can be used to automize workflows. For example, if a tool
knows content and structure of data it can convert them automatically and give
them to another tool as input. By that, users could save many copy-and-paste
actions that are necessary when analyzing data with different tools.
There
are three main types of metadata:
·
Descriptive
metadata describes
a resource for purposes such as discovery and
identification. It can include elements such as title, abstract, author,
and keywords.
·
Structural
metadata indicates
how compound objects are put together, for example, how pages are ordered to
form chapters.
·
Administrative
metadata provides
information to help manage a resource, such as when and how it was created,
file type and other technical information, and who can access it.
10. Components of the computer system
The digital
library framework permits many different computer systems to coexist. The key
components are shown in the figure below. They run on a variety of computer
systems connected by a computer network, such as the Internet.
Major system components
To
demonstrate this framework, we implemented a pilot system. A more comprehensive
prototype will be completed early in 1997.
User interfaces
Both
the pilot and the prototype have two user interfaces: one for the users of the
library, the other for the librarians and system administrators who manage the
collections. Each user interface is in two parts. A standard Internet browser is used for the actual
interactions with the user. This can be Netscape Navigator, Microsoft's
Internet Explorer, or the Grail browser developed by our colleagues at CNRI.
The browser connects to client services,
which provide intermediary functions between the browser and the other parts of
the system. The client services allow the user to decide where to search and
what to retrieve; they interpret information structured as digital objects;
they negotiate terms and conditions, manage relationships between digital
objects, remember the state of the interaction, and convert among the protocols
used by the various parts of the system.
Repository
Repositories
store and manage digital objects and other information. A large digital library
may have many repositories of various types, including modern repositories,
legacy databases, and Web servers. Section 4 of this report describes the pilot
repository that we have implemented and enhancements planned for the prototype.
The interface to this repository is called the repository access protocol (RAP). Features of RAP are explicit
recognition of rights and permissions that need to be satisfied before a client
can access a digital object, support for a very general range of disseminations
of digital objects, and an open architecture with well defined interfaces.
Handle system
Handles
are general purpose identifiers that can be used to identify Internet
resources, such as digital objects, over long periods of time and to manage
materials stored in any repository or database. CNRI's handle system is a
computer system that provides a distributed directory service for identifiers
(handles) for Internet resources. When used with the repository, the handle
system receives as input a handle for a digital object and returns the
identifier of the repository where the object is stored.
Search system
The
design of the digital library system assumes that there will be many indexes
and catalogs that can be searched to discover information before retrieving it
from a repository. These indexes may be independently managed and support a
wide range of protocols. The pilot system is independent of any search system;
the prototype is being linked to CIIR's In Query system, which is already in
use at the Library of Congress.
11. Technical
Challenges
Storage.
A
digital library's storage system must be capable of storing a large amount of
data in a variety of formats and accessing this data as quickly as possible.
Text-only documents—stored in formats such as ASCII, LaTex, HTML, SGML, and
PostScript—are by far the easiest to store. Digital audio and video are more
difficult to store because they require significantly more storage space and
their delivery is time-dependent.
A
typical digital library uses a variety of database-management systems
User interface.
The
user interface, perhaps the most important digital library component, must
incorporate a wide variety of techniques to afford rich interaction between
users and the information they seek. For computer workstations, graphical user
interfaces such as X-Windows, Microsoft Windows, and Macintosh System 7
interfaces are the status quo.
Classification and indexing.
Classification
and indexing schemes are used to collect related content into groups that are
intuitive to a user. Classifying and indexing objects is filled with pitfalls,
however, because individual perceptions vary. Another complicating factor in
indexing and classifying is the tremendous amount of potential content that
remains to be indexed. It is clear that manual methods for classification are
insufficient for all but the most trivial digital library.
Information retrieval.
The
concepts underlying information retrieval were conceived long before computers
and information systems were employed to store library materials. In the
digital library domain, there are a variety of information-retrieval
techniques, including metadata searching, full-text document searching, and
content searching for other data types.
Content delivery.
Once an item
of interest has been located in a digital library, it may be delivered in
several ways. If the content is small, such as 100 pages of text or a 50-Kbyte
image file, it may be delivered through the same channel used for information
retrieval and querying. Content such as movies and software, however, demands
much higher bandwidth. In these cases, delivery is over dedicated leased lines
(for example, cable TV or videoconferencing systems) or satellite-based systems
such as the Hughes Network Systems project
Presentation.
Users of a
traditional library usually want to read a book or watch a videotape; other
uses are rare. With digital technology, it now becomes possible to listen to a
book being read, watch a video of a musical performance alongside the original
score, or hold a mechanical hand as it forms American Sign Language. Other
possible uses are highly personal—individuals may dream up many distinct
variations. A digital library's presentation systems must be flexible and
highly customizable. They must also be aware of the output hardware's capabilities
and limitations, automatically adjusting to deliver the best possible
presentation quality at all times.
Administrative.
Traditional
libraries store a final copy of a book or other documents. Digital libraries
store several versions of a document in a way that makes multiple revisions by
multiple authors possible. In addition, the content for a digital library may
have multiple owners in terms of the sources of the content and annotations
made to the content of the library. An administrative system ensures that
materials intended for public viewing can indeed be viewed by anyone while
private collections and personal annotations may only be viewed by a select
group or single individual
12. CONTENT-BASED
IMAGE RETRIEVAL
Content-based image retrieval, a
technique which uses visual contents to search images from large scale image
databases according to users' interests, Early techniques were not generally
based on visual features but on the textual annotation of images. In other
words, images were first annotated with text and then searched using a
text-based approach from traditional database management systems.
Comprehensive surveys of early text-based image retrieval methods
can be found in. Text-based image retrieval uses traditional database
techniques to manage images. Through text descriptions, images can be organized
by topical or semantic hierarchies to facilitate easy navigation and browsing
based on standard Boolean queries. However, since automatically generating
descriptive texts for a wide spectrum of images is not feasible, most
text-based image retrieval systems require manual annotation of images.
Content-based image retrieval, uses
the visual contents of an image such as color,
shape, texture, and spatial
layout to represent and index the image. In typical content-based image
retrieval systems (Figure 1-1), the visual contents of the images in the
database are extracted and described by multi-dimensional feature vectors. The feature
vectors of the images in the database form a feature database. To retrieve images,
users provide the retrieval system with example images or sketched figures. The
system then changes these examples into its internal representation of feature vectors.
The similarities /distances between the feature vectors of the query example or
sketch and those of the images in the database are then calculated and
retrieval is performed with the aid of an indexing scheme. The indexing scheme
provides an efficient way to search for the image database. Recent retrieval
systems have incorporated users' relevance feedback to modify the retrieval
process in order to generate perceptually and semantically more meaningful
retrieval results. In this chapter, we introduce these fundamental techniques
for content-based image retrieval.
Figure . Diagram for
content-based image retrieval system
Some widely used techniques for
extracting color, texture, shape and spatial relationship from images.
Color
Its three-dimensional values make its
discrimination potentiality superior to the single dimensional gray values of
images. Before selecting an appropriate color description, color space must be
determined first
Color Space
Each pixel of the image can be
represented as a point in a 3D color space. Commonly used color space for image
retrieval include RGB, and opponent color space.
However, one of the desirable
characteristics of an appropriate color space for image retrieval is its uniformity Uniformity means that two
color pairs that are equal in similarity distance in a color space are
perceived as equal by viewers. In other words, the measured proximity among the
colors must be directly related to the psychological similarity among them.
RGB space is a widely used color space
for image display. It is composed of three color components red, green, and blue.
These components are called "additive
primaries" since a color in RGB space is produced by adding them
together.
Color Moments
Color moments have been successfully
used in many retrieval systems, especially when the image contains just the
object. The first order (mean), the second (variance)
and the third order (skewness) color moments have been proved
to be efficient and effective in representing color distributions of images. Mathematically,
the first three moments are defined as:
where
fij is the value of the i-th color component of the image pixel j,
and N is the number of pixels in the image.
Color Histogram
The color histogram serves as an
effective representation of the color content of an image if the color pattern
is unique compared with the rest of the data set. The color histogram is easy
to compute and effective in characterizing both the global and local distribution
of colors in an image.
Texture:
Texture measures look for visual
patterns in image and how they are spatially defined. Textures are represented
by texels which are then placed into a number of sets, depending on how many
textures are detected in the image. These sets not only define the texture, but
also where in the image the texture located.
Shape:
Shape does not refer to the shape of
an image but to the shape of a particular region that is being sought out.
Shapes will often be determined first applying segmentation or edge detection
to an image. In some cases accurate shape detection will require human
intervention because methods like segmentation are very difficult to completely
automate.
13. Advantages and disadvantages of the Digital Library
- No physical boundary: The user of a digital library need not to go to
the library physically, people from all over the world could gain access
to the same information, as long as an Internet connection is available.
- Round the clock
availability: Digital libraries can be
accessed at any time, 24 hours a day and 365 days of the year
- Multiple accesses: The same resources can be used at the same time by
a number of users.
- Structured approach: Digital library provides access to much richer
content in a more structured manner i.e. we can easily move from the
catalog to the particular book then to a particular chapter and so on.
- Information retrieval: The user is able to use any search term bellowing
to the word or phrase of the entire collection. Digital library will
provide very user friendly interfaces, giving click able access to its
resources.
- Preservation and
conservation: An exact copy of the original
can be made any number of times without any degradation in quality.
- Space: Whereas traditional libraries are limited by
storage space, digital libraries have the potential to store much more
information, simply because digital information requires very little
physical space to contain them. When the library had no space for
extension digitization is the only solution.
- Networking: A particular digital library can provide the link
to any other resources of other digital library very easily thus a
seamlessly integrated resource sharing can be achieved.
- Cost - The cost of maintaining a digital library is much
lower than that of a traditional library. A traditional library must spend
large sums of money paying for staff, book maintains, rent, and additional
books. Digital libraries do away with these fees.
Disadvantages of the Digital Library
- Copyright: - Digitization violates the copy right law as the
thought content of one author can be freely transfer by other without his
acknowledgement. So One difficulty to overcome for digital libraries is
the way to distribute information. How does a digital library distribute
information at will while protecting the copyright of the author?
- Speed of access: - As more and more computer are connected to the
Internet its speed of access reasonably decreasing. If new technology will
not evolve to solve the problem then in near future Internet will be full
of error messages.
- Initial cost is high: - The infrastructure cost of digital library i.e.
the cost of hardware, software; leasing communication circuit is generally
very high.
- Band width: - Digital library will need high band for transfer
of multimedia resources but the band width is decreasing day by day due to
its over utilization.
- Efficiency: - With the much larger volume of digital
information, finding the right material for a specific task becomes
increasingly difficult.
- Environment: - Digital libraries cannot reproduce the
environment of a traditional library. Many people also find reading
printed material to be easier than reading material on a computer screen.
- Preservation: - Due to technological developments, a digital
library can rapidly become out-of-date and its data may become
inaccessible
14. CONCLUSION
Digital
libraries are not going to replace the physical existence of document
completely but no doubt to meet the present demand, to satisfy the non local
user digitization must be introduced so that at least libraries becomes of
hybrid nature. The initial cost of digitization is high but experiment shows
that ones digitization is introduced then the cost to manage this collection
will be cheaper than that of any traditional library.
REFERENCES
- Funamental of Multimedia by Ze – Mian Li and Mark S.
Drew Pearson, Edi. 2004
- www.ieee.com
- www.digitallibrarydetail.htm
- www.google.com
Comments
Post a Comment