
Rivero L.Encyclopedia of database technologies and applications.2006
.pdf
FUTURE TRENDS
Whereas commercial RDBMS vendors remain “locked in a …fight for customers” (Boulton, 2003, p. 1), OSDBMS provide not only a less expensive and popular alternative to vendors and small to medium firms, but also become a dependable Web content database for many portals (Babcock, 2004; Brockmeier, 2003). The impetus has given rise to xml or Web databases that are tailored for the generation and storage of Web content. Many shared servers and low-budget Web sites (e.g., Yahoo.com, Slashdot.org, Sourceforge.net) used OSDBMS as an inexpensive option (Wayner, 2001). However, some Web sites had to find an alternative choice as their user base and volume of transaction increased. For example, Martin (2003) noted that the Sourceforge Web site had to migrate from PostgreSQL as a transaction database platform to DB2. OSDBMS have also been described as one of the pillars of today’s most powerful technologies, big and small businesses, and supporting high volume business transactions (LaMonica, 2003). Nonetheless, Figure 3 shows that for many executives, OSDBMS vendors, users, and developers, there are some urgent issues to consider when it comes to database selection. Other factors include readiness by many managing directors to accept the F/OSSD paradigm. One area where OSDBMS are making a breakthrough is in grid computing. With increase recognition of new research areas that are implementing OSDBMS in their work, OSDBMS mainstream adoption and use in mission-critical systems will be pervasive as early as 2006 (Babcock, 2004). According to Evans Data Corporation (2004), the reliability factor has persisted for some time as the most important criteria for selecting a database.
Embracing OSDBMS has remained a painful decision for database managers and administrators, and many users of commercial databases. A decision to select the right database depends on the needs of the organization and will be made based on budget, database size and
Figure 3. Most important criteria for selecting a database (Evans Data Corporation Database Development Survey, 2004)
Respons |
25 |
15 |
|
|
20 |
Percentage |
10 |
|
5 |
|
0 |
Reliability |
Tota l Cost of |
Ability to |
Performa nce |
Sca lability |
|
Ow nership |
Inte grate |
|
|
|
|
Criteria |
|
|
Open Source Database Management Systems
scalability requirements, high-availability requirements, database functionality, and level of support required (Lowe, 2002). However, migrating to OSDBMS that cannot grow with customers’ needs could be costly for an enterprise in terms of retraining, infrastructure upgrades, licensing, and consulting fees. Rooney (2004) argued that these costs can be far greater than the initial savings of acquiring OSDBMS. Nonetheless, OSDBMS firms (e.g., sleepycat software) will continue to implement F/OSS strategies and make business by selling their commercial versions and offering support and other services (Boulton, 2003). At the same time, firms will embrace F/OSS paradigm in order to tap into “the vast global community of developers and reduce their cost of production by not having to [re]invent the wheel” (Krishnamurthy, 2003, p. 8). Furthermore, storage and security will define OSDBMS capability of the future, as data storage and computing needs of businesses continue to push database makers to develop faster and more scalable software (Orzech, 2003). Storage is an important factor because larger databases require larger storage facilities.
OSDBMS are no longer unknown quantities among the F/OSS community and software enterprises. Developers, vendors, and database users are already familiar with major players in the OSDBMS market—MySQL and PostgreSQL. Berkeley DB, Firebird, SAP DB, and many others also continue to attract attention in the database market, and the user base of these databases continues to grow.
What follows is a brief discussion on the two most popular OSDBMS. OSDBMS are ready for prime time because they:
•benefit from F/OSSD,
•are indeed scalable and popular with markets dominated by commercial databases,
•are suitable for industry’s mission-critical data (Gedda, 2003),
•reduce vendor lock-in, and
•as Boulton (2003) posits, are ripe for commoditization.
MySQL is considered as the most widely used OSDBMS that is constantly adding new features without comprising speed and reliability (Brockmeier, 2003). Available under General Public License (GPL) and commercial licences, MySQL is said to give database users and customers a choice. The dual licensing allows the use of MySQL as a commercial product without compromising its F/OSS status. The F/OSS community provides 24/7 services and support in terms of debugging, pool for patches, suggestions for future releases, and essential functionalities. The community gives users
460
TEAM LinG

Open Source Database Management Systems
Figure 4. Platforms on which PostgrSQL is used |
Figure 5. Databases used prior to PosrgreSQL |
|
||||||||||||||||||||||||||||||||||||||||
O |
||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ot her |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
Irix |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M S Acces s |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
Win32/Cygwin |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M yS QL |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
Hp-UX |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S ybas e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
AIX |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inf or mix |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Solaris |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D B 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
xBSD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S QL S er ver |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Linux |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Or acl e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
0 |
10 |
20 |
30 |
40 |
50 |
60 |
0 |
10 |
|
|
2 0 |
3 0 |
4 0 |
50 |
|
|||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
Percentage Response |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P er cent age R es pons e |
|
|
|
|
and vendors the added confidence that resources are always available when problems need to be fixed and upgrades are needed, alleviating fear of vendor lock-in.
PostgreSQL is much earlier in the OSDBMS market than MySQL and has supported enterprise-level features (e.g., transaction support) much longer (Brockmeier, 2003). Released under Berkeley Software Distribution (BSD) licence, PostgreSQL like most OSDBMS can be freely downloaded from the project’s Web site. Because its license allows companies to produce proprietary versions without requiring them to share their code with others or pay licensing fees, PostgreSQL is being perceived as more business friendly than its OSDBMS counterparts. However, Boulton (2003) posits that the license also allows companies to produce what may be incompatible versions of PostgreSQL. Even though this has not happen yet, there is a lot of forking going on in the F/OSSD scene.
The PostgreSQL user survey shows interesting characteristics of OSDBMS. As developers target the Linux operating system, a substantial OSDBMS run on Windows and other systems (Figure4), reflecting the interests of the F/OSS developer and user communities as well as targeting market dominant operating systems. The survey also shows that there is a significant move towards OSDBMS from commercial RDBMS (Figure 5). Interestingly, there is also diffusion of users within OSDBMS. Over 40% of the 4,063 respondents used MySQL prior to using PostgreSQL.
CONCLUSION
This paper has discussed and put forward thornier issues on the developments and impacts of OSDBMS on the operation of business organizations, educational and academic/research establishments, and socioeconomic and technological challenges posed by OSDBMS. Databases have evolved over time in development and design complexity, marketing strategies, and continue to have
a greater impact in the way organizations operate and make their business decisions. The windfall has brought immense response from software companies, database researchers and professionals, database teachers and learners, and many more. While there is continued popularity in the use and implementation of OSDBMS, many continue to use their legacy databases. Thus, a form of quasi modus vivendi will continue to exist, at least in the immediate future, between OSDBMS and their commercial RDBMS counterparts, and the business and corporate culture needs to be aware of this symbiotic relationship. There are ample postulates about the success of F/OSS products and companies, but the success of OSDBMS on the same line needs a cautionary approach as the database market is not only small in scope but is getting increasingly saturated. The discussion presented here is our current understanding of the ecology of OSDBMS. A consolidated and ongoing empirical research is needed in OSDBMS, as has been done in other F/OSS projects, to better understand the trends, dual-license business model, sustainable F/OSS community involvement, and market factors surrounding the evolutionary trends of OSDBMS.
REFERENCES
Adams, L. (2002). Is vendor lock-in universally bad? Retrieved January 22, 2005, from http://builder.com. com/5100-6387-1058909.html
Babcock, C. (2004). Popularity growing for open-source databases. Retrieved January 22, 2005, from http:// www.informationweek.com/shared/printableArticle. jhtml?articleID=18312009
Boulton, C. (2003). Are open source databases following in Linux footsteps? Retrieved January 22, 2005, from http://www.databasejournal.com/news/article. php/2222061
461
TEAM LinG
Brockmeier, J. (2003). Battle of the open source databases. Retrieved January 22, 2005, from http:// www.newsfactor.com/perl/story/20495.html
Editingwhiz. (2003). Panel sees trends in open source for 2004. Retrieved January 22, 2005, from http:// software.itmanagersjournal.com/software/03/12/24/ 0155215.shtml
Evans Data Corporation. (2004). Database development survey. Retrieved January 22, 2005, from http://www. evansdata.com/n2/surveys/db_toc_04_1.shtml
Feller, J., & Fitzgerald, B. (2000). A framework analysis of the open source software development paradigm. In Proceedings of the 21st International Conference on Information Systems (ICIS), Brisbane, Queensland, Australia (pp. 58-69).
Gedda, R. (2003). Analysis: Open source databases. Retrieved January 22, 2005, from http://www.linux world.com.au/ index.php?id=787941582& fp=2& fpid =1
Jeusfeld, M. (2003). Publicly available database software. Retrieved January 22, 2005, from http://www.acm. org/sigmod/databaseSoftware/
Krishnamurthy, S. (2003). A managerial overview of open source software. Business Horizons, 46(5), 47-56.
LaMonica, M. (2003). Luxury models face cost-con- scious buyers. Retrieved January 22, 2005, from http:/ /news.com.com/2009-1001_3-1001340.html
Lowe, S. (2002). Database wars: Open source versus commercial. Retrieved January 22, 2005, from http:/ / w w w . z d n e t . c o m . a u / i n s i g h t / 0 , 3 9 0 2 3 7 3 1 , 2 0 2 6 4039,00.htm
Martin, V. (2003). Why DB2 vs Open source database sales guide. Retrieved January 22, 2005, from ftp:// ftp.software.ibm.com/software/data/pubs/papers/ db2openspace.pdf
Orzech, D. (2003). Rapidly falling storage cost means bigger databases, new applications. Retrieved January 22, 2005, from http://www.cioupdate.com/trends/ article.php/2217351
Perez, J. (2004). Open source DBs go big time. Retrieved January 22, 2005, from http://www.intelligent enterprise.com/print_article.jhtml?articleID= 17601165
Open Source Database Management Systems
Rooney, P. (2004). Database, security, storage are next layers for open source commoditization. Retrieved January 22, 2005, from http://www.crn.com/Components/ printArticle.asp?ArticleID=47322
Sol, S. (2003). What is a database? (Part 1 of 4). Database articles and tutorials. Retrieved January 22, 2005, from http://www.theukwebdesigncompany.com/articles/database.php
Wayner, P. (2001). Open source databases bloom. Retrieved January 22, 2005, from http://www.computer w o r l d . c o m / s o f t w a r e t o p i c s / s o f t w a r e / s t o r y / 0,10801,63629,00.html
KEY TERMS
Free/Open Source Software (F/OSS): Software whose source code, under certain license agreements, is freely available for modification, distribution, and innovation.
Grid Computing: A distributed computing setting which has the tendency to allow users to communicate and share resources without much worry about its origin.
Open Source Databases: A class of relational databases developed and distributed by means of the F/OSS development model.
Outsourcing: The purchasing or contracting of tasks, goods, or services to an external source by a software firm.
Second-Generation Companies (SGC): F/OSS companies (e.g., MySQL, Sleepycat Software) employing dual-licensing-based business model that supports F/OSS philosophy and methodology in a profitable and sustainable software development environment.
Total Cost of Ownership (TOC): The total cost associated with acquiring software. This may include, but is not limited to, code downloading time, installation, maintenance, training, and so forth.
Vendor Lock-In: A situation where a software product is dependent on a single vendor’s implementation of a technology.
Web Services: Technologies that make application- to-application communication on the World Wide Web possible.
462
TEAM LinG
|
463 |
|
Open Source Software and Information Systems |
|
|
|
O |
|
on the Web |
|
|
|
|
|
|
|
AntonioCartelli
University of Cassino, Italy
INTRODUCTION
Databases and systems for their management are today more and more important for individual and corporate applications. Furthermore the spreading of the Internet has made possible the achievement of centralized information systems, accessible from everywhere on the Net, both for querying data and for managing them.
A special role is played in this process by open source software and especially by some packages which can be linked all together to build online information systems.
It is well known that open source tools are today widely used by developers and programmers and that they are a valid and reliable alternative to proprietary software, but often a mortal terror prevents common people from using them.
In what follows, after a short survey of historical events leading to the creation of the most famous open source packages and the Open Software Foundation, a description of the author’s experiences is proposed, so showing how, in some special cases, an effort in learning new topics and developing adequate skills can produce relevant effects in solving very different problems and in researching.
BACKGROUND
In the late ’60s the spreading of computing systems led many scientists to hypothesize that the target of shared computing and the consequent access to computer resources by everyone could be hit with the use of great centralized systems, equipped with multi-program, timesharing and multiple-access operating systems. The MULTICS (Multiplexed Information and Computer Service) case is perhaps the best example in this respect; it was a project carried out by MIT, Bell Labs, and General Electric, which meant creating a huge machine providing computing power to everyone (Corbato, Saltzer, & Clingen, 1972). Notwithstanding the MULTICS project was abandoned, it introduced many seminal ideas in computing literature, and some computer scientists at Bell Labs worked on it and made a one-user version of the system for a PDP-7 (DEC minicomputer); they called this new
system UNICS (Uniplexed Information and Computing Service) but suddenly renamed it UNIX (Bach, 1986). It is well known that Ritchie’s development of the C language, the rewriting of the whole UNIX system in this new language, and its distribution for free to universities and research centers made the fortune of this operating system, which was implemented and installed on a great variety of computing systems in a few years.
The development of LSI (large-scale integration) circuits in late ’70s led to personal computing, i.e., to computers not very different for their architecture from minicomputers but more and more cheaper. Everyone could now have a computer for his/her own use and, as history demonstrates, the dream of great centralized systems for shared computing was abandoned.
The case of operating systems, in the author’s opinion, is emblematic for the impulse that personal computing gave to proprietary software. It is well known, in fact, that two main operating systems became popular on PCs. The first one, MS-DOS by Microsoft Inc., for the IBM PC and all machines equipped with the 8088 CPU, and the second one, UNIX by various distributors, for high-level personal computers equipped with the Motorola 68000 CPU family (Tanenbaum, 1987); none of the above systems was for free or was freely available. In other words, the introduction of PCs didn’t help computer science knowledge, until then mostly shared among scholars and researchers, to exit from laboratories, and only the efforts of a few people led to the creation of operating systems freely available with their source code.
Once more, UNIX is the reference example. Its transformation into a commercial product with a license prohibiting the public study of its source code led many universities (adopting it for their operating systems courses when it was free) to abandon the system. Some scholars, on the contrary, decided to entirely rewrite the kernel of the system, letting it open, and one of the most famous examples in this regard was MINIX, developed by A. S. Tanenbaum (Tanenbaum, Van Staveren, Keizer, & Stevenson, 1983).
Autonomously from Tanenbaum, Richard Stallman and Linus Torvald developed prototypes of freely available operating systems and decided to join their efforts for creating GNU/Linux (today better known as Linux); it was
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
Open Source Software and Information Systems on the Web
available on the Net in 1991 (Beck et al., 1996), but in a few years it evolved (and is still evolving) and became so steady and reliable as to be a serious and valid alternative to Microsoft Network Server’s software.
Operating systems were not the only software freely available or developed to be freely accessible with their source code, but the success of Linux (as an operating system) is very important for the spreading of many other initiatives (which used that OS for their implementation).
Faster and more efficient ways for accessing software became available with the Internet, and the use of open and/or free software (under special licenses like BSD, GPL, etc.) was made easier. New individuals and communities of developers worked on other software projects and adopted the same strategy of making freely available their source code; consortia like FSF (Free Software Foundation) and OSF (Open Software Foundation) were then created for helping people in defining standards, protecting their rights, and continuing the hard work of developers.
Among the various projects one can find on the Net, the following ones, having a relevant part in what follows, will be analyzed in greater detail: the Apache Web server, the PHP scripting language, and the PostgreSQL RDBMS.
The Apache Web server was developed by a group of scientists who left the NCSA System Development Group and was made freely available on the Net (the Apache Web site http://www.apache.org/ is a good starting point for the downloading of the server software). Once compiled and the program started, the HTTP daemon looks for requests coming from the Net and creates system processes to answer to them. One of the main features of this Web server is its modularity and the chance for a Web server administrator to integrate special modules, enhancing the server functionalities within it.
PHP is a scripting language (the reference Web site is http://www.php.net/), making easy for Webmasters the creation of Web interactive pages (i.e., FORMS letting data go back from client to Web server). Very valued features of this software are its modularity, its embedding features, and the interaction it guarantees with most widely used RDBMSs.
The PostgreSQL RDBMS is a software tool for the management of tables and queries in a relational database (it is available from the Web site http:// www.postgresql.org/) by means of the well-known SQL (Structured Query Language), granting special users an easy access to data. The project for this software was firstly developed at Berkeley when the first examples of DBMSs (database management systems) were analyzed and discussed and the relational model was compared with the hierarchical and network models and adopted for it (Bracchi, Martella, & Pelagatti, 1987).
From the remarks reported until now, it can be easily deduced that the Internet and the above tools made easy and cheap the creation of information systems accessible by general and special users for querying and managing the databases hosted on Web servers (Cartelli, 2004a). In other words a Web server on the Internet (now mostly a PC) with all the above tools installed and running and the right Web pages for the storing/retrieving functions accessing a database becomes very similar to a mainframe with its virtual terminal services.
OPEN SOURCE SOFTWARE AND EDUCATION: TWO CASES
Many sites on the Net today allow access to open source software, and some among them host whole projects, already developed or still evolving, with their communities of developers. Users, scholars, and programmers can easily download from these sites the software they need and simply use it or can take part in their developmental projects. An example of the above sites, SourceForge (http://www.sourceforge.net/) is reported here.
Nevertheless one can be induced to create new applications if the already existing ones seem not adequate for special situations or particular problems.
The experiences reported below are good examples, in the author’s opinion, of the need for planning and creating special information systems by means of the open source software: The first one concerns the instruments to be used for paleographic research and teaching, and the second one concerns the carrying out of a special e- learning platform. Both of them are based on the use of the Linux operating system, the Web server Apache, the PHP language, and the PostgreSQL RDBMS.
Open Source, DBMSs, and the
Community of Paleographers
The two information systems described here are the result of the author’s cooperation with M. Palma, a professor of Latin paleography at the University of Cassino (Italy). The first system is devoted to the management of the data concerning women who wrote manuscripts in the Middle Ages (women copyists); the second one manages the bibliography of the manuscripts written in Beneventan, i.e., an ancient medieval script used in South Italy.
The main aim of the dynamic Web site (http:// edu.let.unicas.it/womediev/) named Women and Written Culture in the Middle Ages (Cartelli, Miglio, & Palma, 2001) was to systematize the data emerging from the research on women copyists while leading to an instru-
464
TEAM LinG

Open Source Software and Information Systems on the Web
ment helping scholars and students to find new elements for further studies.
The data appearing relevant to the scientific community were—for women: the name, the qualification (i.e., if she was a nun or a lay), and the date or the period she belonged to; and for manuscripts: the shelfmark (i.e., town, library, and number of the manuscript), the place where it was written, the date or the period it belonged to, the authors and titles of the texts, and the bibliography (or the source of information about the manuscript). Furthermore it appeared important to show for each woman the manuscript/s she wrote and vice versa and, if possible and available, at least an image of the copyist’s hand.
One of the main features of the system is to have two separated sections: the first one being operated only by the editors (by means of special FORMS) so that they can insert, modify, and delete the stored data and ensure the scientific validity of the information reported; and the second one being at everyone’s disposal to obtain the list of all women and manuscripts in the database or to make queries concerning women and manuscripts with specific qualifications.
The other information system the author made up, called BMB online (Bibliografia dei Manoscritti Beneventani online; http://edu.let.unicas.it/bmb/), emerged from the analysis of the following elements: (1) the data to be stored in the database, (2) the users who had to access the database and the operations they were allowed, (3) the query system, and (4) the dataflow. In what follows, the above elements are analyzed in a greater detail (Cartelli & Palma, 2004).
a.The database structure lies on six tables: The first one is used for the data of contributors and scientific administrator/s; the second one contains the data of the materials to be analyzed and identifies the contributor who has to write the corresponding bibliographic cards; the third one is used to store the data of Beneventan manuscripts; the fourth table hosts the first part of the bibliographic data (i.e., the location, the author/s, the title, and every data concerning a publication quoting one or more manuscripts); in the fifth table the second part of the bibliographic data is stored (i.e., the manuscripts’ ID codes and the abstracts with the reason for the quotation); and the sixth table is an electronic blackboard and makes easier the communication among the people involved in the collection of the bibliographical materials.
b.The users accessing the database have different rights and powers: (a) the ones with the least rights are those who can only query the system; (b) at the next level are the contributors who are allowed bibliographic operations; (c) the scientific administrator/s follow, who can manage all the data in the
database and write, modify, and certify the bibliographic cards (also if this last operation can be O done only once); and (d) at the top of the access pyramid is the system administrator, who can do all
the operations allowed to the scientific administrator/s and can access the verified cards to modify or to delete them.
c.Once the bibliographic cards are compiled by the contributors and verified by the scientific administrator/s, they can be queried by generic users, who can access them in four different ways: (a) by author, (b) by manuscript, (c) by contributor, and
(d) by one or more words, or part of them, concerning the location, the author, the series, etc. of a given publication.
d.When the system starts the first time, the database is empty and the system administrator has to input the data for at least a scientific administrator. Scientific administrator/s can then input the data for one or more contributors, so letting them access the system; he/she can also input the bibliographic material to be chosen/assigned to the contributors and can input by him/herself the bibliographic cards. When the contributor/s access the materials to work on, they can compile the bibliographic cards. At last the cards are analyzed and revised by the administrator/s so that they can be read by general users.
It has to be noted that the above information systems were used both for paleographic research and teaching, and there is common agreement among scholars and professors on the effects the above instruments had on studying manuscripts and on teaching. The analysis of students attending the paleography course agree, in fact, with the results of the studies on ICT supporting communities of learners: Students involved in the above experiences not only developed computing skills greater than the ones they could obtain in traditional computing literacy courses but also were immersed in a meta-cogni- tive environment, were submitted to cognitive apprenticeship strategies, and were involved in the discussion and evaluation of the procedures they took part in (in other words, they experienced all the elements of meaningful learning; Varisco, 2002).
Misconceptions, Mental Schemes, and
E-Learning
The project reported here started from the analysis of the problems the students meet while approaching scientific topics and is still in progress; the conclusions reported below are then only partial, as regards the students’
465
TEAM LinG
Open Source Software and Information Systems on the Web
learning problems and the hypothesized solutions, but are supporting the author’s idea of information systems use in research and education and consequently the adoption of open source software for their development.
It is well known that students often manifest wrong ideas which can be interpreted in at least two ways: (1) mental schemes, if only the coherence of the students’ ideas in the interpretation of phenomena is considered (with no reference to scientific paradigms), and (2) preconceptions or misconceptions (when the students’ ideas are compared and evaluated with respect to the right scientific paradigms; Driver & Erickson, 1983).
Studies carried out all over the world with differently aged people (from students to workers, professionals, and teachers) showed that (Cartelli, 2002):
1.A map of the disciplinary fields with the students’ wrong ideas can be drawn.
1.The first one reported the number of the accesses at the site’s pages that a single student or group of students made till the query date (the numerical data were reported into the tree structure of the site).
2.The second one gave the sequence of the students’ accesses at the Web site, ordered by date and hour of access. It could also report the messages the student left in the electronic blackboard, chat, forum, and case study areas and let the teachers compare all the data stored in the same time interval.
The system was experimented with by two different sets of students and had positive effects as regards student performance. There was, in fact, only a 20% of students’ loss at the ending examinations, and more than 65% of them had positive if not excellent scores. But a careful analysis of the data stored in the database showed the limit of the system: The amount of data generated by
2.A lot of strategies and instruments have been prothe second set of students (350 subjects) made impossible
posed until now to help students in overcoming their problems, and a good percentage of success has been measured when they were adopted (nevertheless, there is no systematic study on them).
3.Wrong ideas can persist in students’ minds also after the adoption of the above instruments and strategies.
The author’s experience in computer science (CS) basic courses led him to hypothesize that a special e- learning platform continuously monitoring the didactic process could make easier for the students the learning of the various topics, while giving to professors a powerful instrument for managing their teaching.
The information system the author planned and carried out (Cartelli, 2003) was very similar in its features to an e-learning platform. It offered together with a wellstructured knowledge tree of the topics to be taught/ learnt and special auto-evaluation tests, integrated within the course pages, the following functions: (1) various communication areas implementing virtual environments for teachers/professors, tutors, and students, (2) a careful management of the students’ evaluation and assessment tests, and (3) two functions for the analysis of: (a) the students’ access to course materials and (b) the use they made of communication services.
The management of all information in the site was guaranteed from five user levels or selected accesses: the system administrator, professors, tutors, students, and, at last, didactic researchers and scholars (who could only retrieve the information on the students’ access to the course materials).
The two information retrieval functions that were used for students’ monitoring had the following features:
for the professor the continuous monitoring of the didactic process.
On another hand, the analysis of the students’ answers at the assessment tests (at the end of the courses) still showed the presence of misconceptions and wrong ideas in a relevant number of persons.
FUTURE TRENDS
It is undoubted that in the future the tools the author adopted for carrying out the information systems and the information systems themselves will be improved in their features so that it is not easy to foresee future trends of the research or further systems which will be needed.
Nonetheless a survey of two research studies still in progress will be given in what follows.
As regards paleography, the main field of interest is the influence of the information systems developed for studying and teaching in communities of practice (CoPS), communities of learners (CoLs), and virtual communities. An instrument still evolving will be deeply analyzed: the open catalogue of manuscripts (i.e., an information system supporting ancient libraries in making available on the Net their manuscripts and catalogues; it has been adopted until now only from the Malatestiana Library in Cesena (FO), Italy).
As regards information systems helping students in overcoming their difficulties, the results of the author’s experiences induced a plan to implement descriptive and inferential statistical functions in the system (i.e., data mining functions; Cartelli, 2004b). The reasons for this choice have to be explored in the following elements to be continuously monitored:
466
TEAM LinG

Open Source Software and Information Systems on the Web
1.the change in the time of the features of a single student (by means of special indices describing his/ her behaviors and learning styles),
2.the change in the time of the features of the students’ groups, i.e., the features of the classes they belong to,
3.the change in the space of the features of students’ groups (how different environments can influence the evolution of students’ learning models).
In other words, in the author’s opinion, data mining strategies (and the database management systems made available from open source software) can be applied to the analysis of the teaching-learning process to improve teaching management and students’ results.
CONCLUSION
The above experiences, as already said above, were possible because of the availability of the open source tools Linux, PHP, Apache, and PostgreSQL, but they were also due to the help the author had from the developers’ community on more than one occasion (i.e., to overcome some difficulties he met while implementing the projects). As a consequence the success of open source software has to be explored not only in the reliability and steadiness of that software but also in the circulation and sharing of ideas and expertise the developers’ community affords.
Furthermore, in the case of BMB online, it has to be noted that the online information system comes after an experience the faculty started in 1992 with BIBMAN, a MS-DOS program which reached its physical limits in 2001. The choice of open source software for its carrying out is mostly due to the problems the staff met in recovering the data stored until that data, because the proprietary structure of the program didn’t allow the autonomous elaboration of those data.
The experiences described here are only a part of the work the author completed during the last decade with open source software and are producing great changes in both the ways of carrying out research and teaching; it is then in the author’s opinion that the systems for data management developed until now will produce their best effects if they will be made freely available when at a stable and definite development stage.
REFERENCES
Bach, M. J. (1986). The design of the UNIX operating system. Englewood Cliffs, NJ: Prentice Hall.
Beck, M., Böhm, H., Dziadzka, M., Kunitz, U., Magnus, R.,
& Verworner, D. (1996). LINUX kernel internals. Harlow, O UK: Addison-Wesley.
Bracchi, G., Martella G., & Pelagatti, G. (1987). Sistemi per la gestione di base dei dati. Turin, Italy: Petrini.
Cartelli, A. (2002). Web technologies and sciences epistemologies. In E. Cohen & E. Boyds (Eds.), Proceedings of IS + IT Education 2002 International Conference, Santa Rose, CA (pp. 225-238). Retrieved August 16, 2004, from http://ecommerce.lebow.drexel.edu/eli/2002Proceedings/papers/Carte203Webte.pdf
Cartelli, A. (2003). Misinforming, misunderstanding, misconceptions: What informing science can do. In E. Cohen & E. Boyds (Eds.), Proceedings of IS + IT Education 2003 International Conference (pp. 1259-1273). Santa Rosa, CA: Informing Science Institute.
Cartelli, A. (2004a). Open source software and information management: The case of BMB on line. In M. KhosrowPour (Ed.), Proceedings of IRMA 2004 International Conference: Innovations Through Information Technology (pp. 1023-1024). Hershey, PA: Idea Group.
Cartelli, A. (2004b). Action-guidance: An action research project for the application of informing science in educational and vocational guidance. Issues in Informing Science and Information Technology, 1(1), 763-772.
Cartelli, A., Miglio, L., & Palma, M. (2001). New technologies and new paradigms in historical research. Informing Science, 4(2), 61-66.
Cartelli, A., & Palma, M. (2004). BMB on line: An information system for paleographic and didactic research. In M. Khosrow-Pour (Ed.), Proceedings of IRMA 2004 International Conference: Innovations Through Information Technology (pp. 45-47). Hershey, PA: Idea Group.
Corbato, F. J., Saltzer, J. H., & Clingen, C. T. (1972). MULTICS—The first seven years. Proceedings of AFIPS Spring Joint Computer Conference, 36 (pp. 571-583).
Driver, R., & Erickson, G. (1983). Theories in action: Some theoretical and empirical issues in the study of students’ conceptual frameworks in science. Studies in Science Education, 10, 37.
Tanenbaum, A. S. (1987). Operating systems, design and implementation. Englewood Cliffs, NJ: Prentice Hall.
Tanenbaum, A. S., Van Staveren, H., Keizer, E. G., & Stevenson, J. W. (1983). A practical tool kit for making portable compilers. Communications of the ACM, 26(9), 654-660.
467
TEAM LinG
Open Source Software and Information Systems on the Web
Varisco, B. M. (2002). Costruttivismo socio-culturale. Genesi filosofiche, sviluppi psico-pedagogici, applicazioni didattiche. Rome: Carocci.
KEY TERMS
Apache Software Foundation: Provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus-based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. Most famous projects among them are: http (the well-known Web server), XML (instruments for the development of Web pages based on XML—Extended Markup Language), and Jakarta (Java server).
Data Mining: Analysis of data in a database using tools which look for trends or anomalies without knowledge of the meaning of the data. Data mining was introduced by IBM, which holds some related patents. The application of these strategies can require a data warehouse (i.e., a system for storing, retrieving, and managing a large amount of data).
E-Learning Platform: Although there are today many types of e-learning platforms (free or not, open or not) very similar in their features to information systems, they can accomplish (all together or one by one) the following tasks: (1) to be a CMS (content management system), guaranteeing access to didactic materials for the students; (2) to be an LMS (learning management system), where the use of learning objects makes easier the learning of a given topic; (3) to be a CSCLS (computer-sup- ported collaborative learning system), which makes easier the use of collaborative and situated teaching/learning strategies; and (4) to build a virtual community of students, tutors, and professors using KM (knowledge management) strategies.
Information System: The set of all human and mechanical resources needed for acquisition, storing, re-
trieving, and management of the vital data of the system. With human resources are usually intended both the individuals involved in the use of the system and the procedures they have to carry out. With mechanical resources have to be intended both the hardware and software instruments to be used for the management of data.
Open Source Software: It is the software one can have at his/her disposal together with the source code. Its main feature is to be submitted to licenses obliging those who want to distribute that software, parts of it, or changes to its structure, to do it together with the source code. A special consortium, called Open Group (Open Software Foundation), has been founded with the following mission: to drive the creation of boundaryless-information flow.
PHP Project: The project started with the collection of scripts to be executed as cgi-bin scripts from a Web server and soon became a real language. PHP is now a widely used general-purpose scripting language (very close to C for its syntax) that is especially suited for Web development. Very valued features of this software are:
(1) modularity of the interpreting language, which can be built as a module for Web servers and especially for the Apache; (2) embedding features of the language, which can coexist with HTML code in Web pages; (3) creation on the fly of HTML pages (they can be produced by the server depending on conditions emerging from clients’ answers or choices); and (4) interaction with widely used RDBMSs, including MySQL and PostgreSQL.
PostgreSQL Project: The project had its beginnings in 1986 inside the University of California at Berkeley as a research prototype and in the 16 years since then has moved to its now globally distributed development model, the PostgreSQL RDBMS (formerly known as Postgres, then as Postgres95), with central servers based in Canada. The PostgreSQL Global Development Group is a community of companies and people co-operating to drive the development of PostgreSQL.
468
TEAM LinG
|
469 |
|
Optimization of Continual Queries |
|
|
|
O |
|
|
|
|
|
|
|
SharifullahKhan
National University of Sciences and Technology, Pakistan
INTRODUCTION
Recent advances in the technologies have made it possible to access information from Internet-scale distributed data sources (Florescu, Levy & Mendelzon, 1998). However, finding the right information at the right time is difficult. Update monitoring (Seligman et al., 2000) is a technology that gathers relevant information and forwards it to users in a timely way. Continual queries (CQs) (Chen et al., 2000; Khan & Mott, 2002a; Liu, Pu & Tang, 1999, 2000) provide a significant toolkit for update monitoring. They are persistent queries that are issued once and then are run at regular intervals or when data change until a termination condition is satisfied. They are then removed from the system. They relieve users from having to revisit Web sites or other data sources and reissue their queries frequently to obtain new information that match their queries. A CQ is a typical SQL query having additional triggering and termination conditions. CQs are of two types: change-based and time-based. An example of a CQ is “notify me in the next six months whenever the Microsoft stock price drops by more than 5% from today level.”
CQs OPTIMIZATION
CQs are particularly useful for an environment like the Internet, comprises a large amount of frequently changing information (Chen et al., 2000; Khan & Mott, 2002a; Liu, Pu & Tang, 1999). A CQs system needs to be able to support a large number of queries from a large number of users. This poses many difficulties through the Internet’s widespread distribution and massive use by a large population of users (Chen et al., 2000; Khan & Mott, 2002b). For example, the queries will impact on the local operations of the data source and could overload the data source with duplicate computation of common tasks, each occurring in multiple CQs. Since each data source’s communications and data processing capacity must be divided among all the users as the number of users grows, data servers may become swamped. In addition, these queries could also overload networking with data traffic, much of which may be superfluous. One approach to controlling this problem is to group queries so that they share their computation on the assumption that many queries have
a similar structure. For example, consider relations r1(a,b,c) and r2(x,y,z). The following queries have similar structure:
Q2 = σ a>10 b<5 c=x (r1∞r2 )
Q3 = σ a>15 b<5 c=x (r1∞r2 )
Q4 = σ a<10 b>5 c=x (r1∞r2 )
Grouping queries optimizes the evaluation of the query by executing common operations in the group of queries just once (Chen et al., 2000; Khan & Mott, 2002b; Roy et al., 2000). Moreover, it also avoids unnecessary query invocation over autonomous data sources on the Internet and reduces data traffic over the networks.
Additionally, after the initial evaluation we want a CQ to return only changes in the data source it queries. The simple and straightforward way to perform this is the complete evaluation approach. This approach uploads the complete results of the CQ rather than just updated results, which increases data transmission over the networks. An important approach to the optimization of CQs is therefore to employ differential evaluation. It means that a CQ is evaluated on the changes that have been made in the base data since its previous evaluation of the CQ after the initial evaluation. This approach reduces the data transmission in the subsequent evaluations of the query (Liu et al., 1996). Clearly, this technique is best suited to conditions where the number of changes is small relative to a much larger quantity of source data (Liu et al., 1996).
Existing Systems
Query grouping on the basis of common operations is not new (Roy et al., 2000). However, the grouping of CQs raises new issues: (1) a CQs system has to handle a large collection of CQs due to the scale of the Internet; (2) CQs are not all made available to the system at the same time;
(3) a user’s requests are unpredictable and may change rapidly; and (4) CQs in a group can have different triggering times. So, CQs groups are dynamic; they are continually changing as old queries are deleted and new ones are added. The frequent insertion and deletion of CQs in groups can make those groups inefficient over time and hence reduce overall system performance (Chen et al., 2000; Khan & Mott, 2002b). In this case, one or more
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG