Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

23.5 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 67 / 777 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

data have been generated. Scientists often have to use exploratory methods instead of confirming already suspected hypotheses. Data mining is a research area that aims to provide the analysts with novel and efficient computational tools to overcome the obstacles and constraints posed by the traditional statistical methods. Feature selection, normalization, and standardization of the data, visualization of the results, and evaluation of the produced knowledge are equally important steps in the knowledge discovery process. The mission of bioinformatics as a new and critical research domain is to provide the tools and use them to extract accurate and reliable information in order to gain new biological insights.

REFERENCES

2can Bioinformatics. (2004). Bioinformatics introduction. Retrieved February 2, 2005, from http://www.ebi.ac.uk/ 2can/bioinformatics/index.html

Aas, K. (2001). Microarray data mining: A survey. NR Note, SAMBA, Norwegian Computing Center.

Brazma, A., Parkinson, H., Schlitt, T., & Shojatalab, M. (2004). A quick introduction to elements of biology: Cells, molecules, genes, functional genomics, microarrays. Retrieved February 2, 2005, from http://www.ebi.ac.uk/ microarray/biology_intro.html

Dudoit, S., Fridlyand, J., & Speed, T.P. (2002). Comparison of discrimination methods for the classifcation of tumors using gene expression data. Journal of the American Statistical Association, 97(457), 77-87.

Dunham, M.H. (2002). Data mining: Introductory and advanced topics. Upper Saddle River, NJ: Prentice Hall.

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press/MIT Press.

Ghosh, D., & Chinnaiyan, A.M. (2002). Mixture modeling of gene expression data from microarray experiments.

Bioinformatics, 18, 275-286.

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531-537.

Han, J. (2002). How can data mining help bio-data analysis? Proceedings of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics (pp. 1-2).

Biological Data Mining

Hastie, T., Tibshirani, R., Eisen, M.B., Alizadeh, A., Levy, R., Staudt, L., et al. (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1(2), research0003.

Hirsh, H., & Noordewier, M. (1994). Using background knowledge to improve inductive learning of DNA sequences. Proceedings of the 10th IEEE Conference on Artificial Intelligence for Applications (pp. 351-357).

Houle, J.L., Cadigan, W., Henry, S., Pinnamaneni, A., & Lundahl, S. (2004). Database mining in the human genome initiative. Whitepaper, Bio-databases.com, Amita Corporation. Retrieved February 2, 2005, from http:// www.biodatabases.com/ whitepaper.html

Hunter, L. (2004). Life and its molecules: A brief introduction. AI Magazine, 25(1), 9-22.

Kerr, M.K., & Churchill, G.A. (2001). Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. Proceedings of the National Academy of Sciences, 98 (pp. 8961-8965).

Lazzeroni, L., & Owen, A. (2002). Plaid models for gene expression data. Statistica Sinica, 12(2002), 61-86.

Li, J., Ng, K.-S., & Wong, L. (2003). Bioinformatics adventures in database research. Proceedings of the 9th International Conference on Database Theory (pp. 31-46).

Luscombe, N.M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? A proposed definition and overview of the field. Methods of Information in Medicine, 40(4), 346-358.

Ma, Q., & Wang, J.T.L. (1999). Biological data mining using Bayesian neural networks: A case study. International Journal on Artificial Intelligence Tools, Special Issue on Biocomputing, 8(4), 433-451.

Molla, M., Waddell, M., Page, D., & Shavlik, J. (2004). Using machine learning to design and interpret geneexpression microarrays. AI Magazine, 25(1), 23-44.

National Center for Biotechnology Information. (2004). Genbank statistics. Retrieved February 2, 2005, from http:/ /www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Piatetsky-Shapiro, G., & Tamayo, P. (2003). Microarray data mining: Facing the challenges. SIGKDD Explorations, 5(2), 1-5.

Sander, J., Ester, M., Kriegel, P.-H., & Xu, X. (1998). Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2, (pp. 169-194).

TEAM LinG

Biological Data Mining

Smolkin, M., & Ghosh, D. (2003). Cluster stability scores for microarraydataincancerstudies.BMCBioinformatics,4,36.

Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., & Brown, P. (1999). Clustering methods for the analysis of DNA microarray data (Tech. Rep.). Stanford, CA: Stanford University.

Two Crows Corporation. (1999). Introduction to data mining and knowledge discovery (3rd ed.).

U.S. Department of Energy Office of Science. (2004). Human Genome Project information. Retrieved February 2, 2005, from http://www.ornl.gov/sci/techresources/ Human_Genome/home.shtml

Velculescu, V.E., Zhang, L., Vogelstein, B., & Kinzler, K.W. (1995). Serial analysis of gene expression. Science, 270(5235), 484-487.

Whishart, D.S. (2002). Tools for protein technologies. In C.W. Sensen (Ed.), Biotechnology, (Vol 5b) genomics and bioinformatics (pp. 325-344). Wiley-VCH.

Wikipedia Online Encyclopedia. (2004). Retrieved February 2, 2005, from http://en2.wikipedia.org

Yeung, Y.K., Medvedovic, M., & Bumgarner, R.E. (2003). Clustering gene-expression data with repeated measurements. Genome Biology, 4(5), R34.

Zeng, F., Yap C.H.R., & Wong, L. (2002). Using feature generation and feature selection for accurate prediction of translation initiation sites. Genome Informatics, 13, 192-200.

Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T.,

& Muller, R.-K. (2000). Engineering support vector ma- B chine kernels that recognize translation initiation sites.

Bioinformatics, 16(9), 799-807.

KEY TERMS

Data Cleaning (Cleansing): The act of detecting and removing errors and inconsistencies in data to improve its quality.

Genotype: The exact genetic makeup of an organism.

Machine Learning: An area of artificial intelligence, the goal of which is to build computer systems that can adapt and learn from their experience.

Mapping: The process of locating genes on a chromosome.

Phenotype: The physical appearance characteristics of an organism.

Sequence Alignment: The process to test for similarities between a sequence of an unknown target protein and a single (or a family of) known protein(s).

Sequencing: The process of determining the order of nucleotides in a DNA or RNA molecule or the order of amino acids in a protein.

Visualization: Graphical display of data and models facilitating the understanding and interpretation of the information contained in them.

TEAM LinG

Biometric Databases

MayankVatsa

Indian Institute of Technology, India

Richa Singh

Indian Institute of Technology, India

P.Gupta

Indian Institute of Technology, India

A.K.Kaushik

Ministry of Communication and Information Technology, India

INTRODUCTION

Biometric databases are the set of biometric features collected from a large public domain for the evaluation of biometric systems. For the evaluation of any algorithm for a particular biometric trait, the database should be a large collection of that particular biometric trait. The creation and maintenance of biometric databases involve various steps like enrollment and training.

Enrollment is storing the identity of an individual in the biometric system with his/her biometric templates. Every system has some enrollment policy, like the procedure for capturing the biometric samples, quality of the templates. These policies should be made such that they are acceptable to the public. For example, the procedure for capturing iris images involves the use of infrared light, which might even cause damage to human eyes. The signature of any individual is considered private for him, and he might have some objection to giving his signature for enrollment in any such system because of the unpredicted use of their templates (where and how this information may be used).

There are two types of enrollment, positive enrollment and negative enrollment. Positive enrollment is for candidates with correct identity civilians while negative enrollment is for unlawful or prohibited people.

For enrollment, the biometric sample captured should be such that the algorithms can perform well on that. The features which are required for any algorithm to verify or match the template should be present. The poor biometric sample quality increases the failure to enroll rate. For verification also, if the biometric sample is not captured properly this may lead to an increase in false acceptance and false rejection rates. This increase in failure to enroll and false acceptance and false rejection rates increases the manual intervention for every task, which leads to a further increase in the operational cost along with the risk in security.

Training is tuning the system for noise reduction, preprocessing, and extracting the region of interest so as to increase the verification accuracy (decreasing the false acceptance and false rejection rates).

MAIN THRUST

There are various biometric databases available in the public domain for researchers to evaluate their algorithms. Given below are the various databases for different biometric traits along with their specifications and Web sites.

Face Database

The face database is the necessary requirement for a face authentication system. The face database considers differing scaling factors, rotational angles, lighting conditions, background environment, and facial expressions.

Most of the algorithms have been developed using photo databases and still images that were created under a constant predefined environment. Since a controlled environment removes the need for normalization, reliable techniques have been developed to recognize faces with reasonable accuracy with the constraint that the database contains perfectly aligned and normalized face images. Thus the challenge for face recognition systems lies not in the recognition of the faces, but in normalizing all input face images to a standardized format that is compliant with the strict requirements of the face recognition module (Vatsa, Singh, & Gupta, 2003; Zhao, Chellappa, Rosenfeld, & Philips, 2000). Although there are models developed to describe faces, such as texture mapping with the Candide model, there are no clear definitions or mathematical models to specify what the important and necessary components are in describing a face.

TEAM LinG

Biometric Databases

Using still images taken under restraint conditions as input, it is reasonable to omit the face normalization stage. However, when locating and segmenting a face in complex scenes under unconstraint environments, such as in a video scene, it is necessary to define and design a standardized face database.

There are several face databases are available which can be used for face detection and recognition (see Table 1). The FERET database contains monochrome images taken in different frontal views and in left and right profiles. Only the upper torso of an individual (mostly head and neck) appears in an image on a uniform and uncluttered background. Turk and Pentland created a face database of 16 people. The face database from AT&T Cambridge Laboratories, formerly known as the Olivetti database and also known as the ORL-AT&T database, consists of 10 different images for 40 persons. The images were taken at different times, varying the lighting, facial expressions, and facial details. The Harvard database consists of cropped, masked frontal face images taken from a wide variety of light sources. There are images of 16 individuals in the Yale face database, which contains 10 frontal images per person, each with different facial expressions, with and without glasses, and under different lighting conditions. The XM2VTS multimodal database contains sequences of face images of 37 people. The five sequences for each person were taken over one week. Each image sequence contains images from right profile (- 90 degree) to left profile (90 degree) while the subjects count from 0 to 9 in their native languages. The UMIST database consists of 564 images of 20 people with varying poses. The images of each subject cover a range of poses from right profile to frontal views. The Purdue AR database contains over 3,276 color images of 126 people (70 males and 56 females) in frontal view. This database is designed for face recognition experiments under several mixing factors such as facial expressions, illumination conditions, and occlusions. All the faces appear with different facial expressions (neutral, smile, anger, and scream), illumination (left light source, right light source, and sources from both sides), and occlusion (wearing sunglasses or scarf). The images were taken during two sessions separated by two weeks. All the images were taken by the same camera setup under tightly controlled conditions of illumination and pose. This face database has been applied to image and video indexing as well as retrieval. Vatsa et al. (2003) and Zhao et al. (2000) can be referred to for the details of preparation of face databases. These works have described how to prepare the face image database and the challenges in database preparation.

Table 1. Brief description of popular face databases

Face	Specifications	WWW Address
Database	Specifications	WWW Address
Database
AR Face	4,000 color images of 126 people,	http://rvl1.ecn.purdue.edu/~aleix
AR Face	with different facial expressions,	http://rvl1.ecn.purdue.edu/~aleix
Database	with different facial expressions,	/aleix_face_DB.html
Database	illumination, and occlusions.	/aleix_face_DB.html
	illumination, and occlusions.
	It consists of ten 92-by-112-pixel
AT&T Face	gray scale images of 40 subjects,	http://www.uk.research.att.com/f
Database	with a variety of lightning and	acedatabase.html
	facial expressions.
	114 persons, 7 images for each
CVL Face	person, under uniform
CVL Face	illumination, no flash, and with	http://lrv.fri.uni-lj.si/facedb.html
Database	illumination, no flash, and with	http://lrv.fri.uni-lj.si/facedb.html
Database	projection screen in the
	projection screen in the
	background.
CMU/VASC	2,000 image sequences from over	http://vasc.ri.cmu.edu/idb/html/f
CMU/VASC	200 subjects.	ace/facial_expression/
	200 subjects.	ace/facial_expression/
	Largest collection of faces, over
FERET	14,000 images, with different	http://www.itl.nist.gov/iad/huma
Database	facial expressions, illumination,	nid/feret/
	and positions.
Harvard	Cropped, masked frontal face	http://cvc.yale.edu/people/facult
Harvard	images under a wide range of	http://cvc.yale.edu/people/facult
Database	images under a wide range of	y/belhumeur.html
Database	illumination conditions.	y/belhumeur.html
	illumination conditions.
	1,000 color images of 50
IITK Face	individuals in lab environment,	http://www.cse.iitk.ac.in/users/m
Database	having different poses, occlusions,	ayankv
	and head orientations.
Extended	XM2VTSDB contains four
Extended	recordings of 295 subjects taken
M2VTS	recordings of 295 subjects taken	http://www.ee.surrey.ac.uk/Rese
M2VTS	over a period of four months. Each	http://www.ee.surrey.ac.uk/Rese
Database	over a period of four months. Each	arch/VSSP/xm2vtsdb
Database	recording contains a speaking head	arch/VSSP/xm2vtsdb
XM2VTSDB	recording contains a speaking head
XM2VTSDB	shot and a rotating head shot.
	shot and a rotating head shot.
	37 different faces and 5 shots for
	each person, taken at one-week
M2VTS	intervals or when drastic face
Multimodal	changes occurred, from 0 to 9	http://www.tele.ucl.ac.be/PROJE
Face	speaking head rotation and from 0	CTS/M2VTS/
Database	to 90 degrees and then back to 0
	degrees, and once again without
	glasses if they wore.
Max Planck
Institute for	7 views of 200 laser-scanned heads	http://faces.kyb.tuebingen.mpg.d
Biological	without hair.	e/
Cybernetics
	Faces of 16 people, 27 of each
MIT Database	person, under various conditions of	ftp://whitechapel.media.mit.edu/
MIT Database	illumination, scale, and head	pub/images/
	illumination, scale, and head	pub/images/
	orientation.
NIST 18 Mug	Consists of 3,248 mug shot
NIST 18 Mug	images, 131 cases with both profile
Shot	images, 131 cases with both profile	http://www.nist.gov/szrd/nistd18
Shot	and frontal views, 1,418 with only	http://www.nist.gov/szrd/nistd18
Identification	and frontal views, 1,418 with only	.htm
Identification	frontal view, scanning resolution is	.htm
Database	frontal view, scanning resolution is
Database	500 dpi.
	500 dpi.

	Consists of 564 images of 20
UMIST	people, covering a range of	http://images.ee.umist.ac.uk/dan
Database	different poses from profile to	ny/database.html
	frontal views.
	125 different faces, each in 16
University of	different camera calibrations and
University of	illuminations, an additional 16 if
Oulu Physics-	illuminations, an additional 16 if	http://www.ee.oulu.fi/research/i
Oulu Physics-	the person has glasses, faces in	http://www.ee.oulu.fi/research/i
Based Face	the person has glasses, faces in	mag/color/pbfd.html
Based Face	frontal position captured under	mag/color/pbfd.html
Database	frontal position captured under
Database	horizon, incandescent, fluorescent,
	horizon, incandescent, fluorescent,
	and daylight illuminance.
University of	300 frontal-view face images of 30	ftp://iamftp.unibe.ch/pub/Images
Bern	people and 150 profile face	ftp://iamftp.unibe.ch/pub/Images
Bern	people and 150 profile face	/FaceImages/
Database	images.	/FaceImages/
Database	images.
Usenix	5,592 faces from 2,547 sites.	ftp://ftp.uu.net/published/usenix/
Usenix	5,592 faces from 2,547 sites.	faces/
		faces/
Weizmann	28 faces, includes variations due to	ftp.eris.weizmann.ac.il
Institute of	changes in viewpoint, illumination,	ftp.eris.weizmann.ac.il
Institute of	changes in viewpoint, illumination,	ftp.wisdom.weizmann.ac.il
science	and three different expressions.	ftp.wisdom.weizmann.ac.il
science	and three different expressions.
Yale Face	165 images (15 individuals), with
Yale Face	different lighting, expression, and	http://cvc.yale.edu
Database	different lighting, expression, and	http://cvc.yale.edu
Database	occlusion configurations.
	occlusion configurations.
	5,760 single light source images of
Yale Face	10 subjects, each seen under 576	http://cvc.yale.edu/
Database B	viewing conditions (9 poses x 64	http://cvc.yale.edu/
Database B	viewing conditions (9 poses x 64
	illumination conditions).

TEAM LinG

Fingerprint Database

Fingerprints are the ridge and furrow patterns on the tip of the finger and have been used extensively for personal identification of people. Fingerprint-based biometric systems offer positive identification with a very high degree of confidence. Thus, fingerprint-based authentication is becoming popular in a number of civilian and commercial applications. Fingerprints even have a number of disadvantages as compared to other biometrics. Some of the people do not have clear fingerprints; for example, people working in mines and construction sites get regular scratches on their fingers. Finger skin peels off due to weather, sometimes develops natural permanent creases, and even forms temporary creases when the hands are immersed in water for a long time. Such fingerprints can not be properly imaged with existing fingerprint sensors. Further, since fingerprints can not be captured without the user’s knowledge, they are not suited for certain applications such as surveillance.

For the database, the user has the option of giving the fingerprint of any finger at the time of enrollment; of course, one should give the image of the same finger for correct matching at the time of verification also. The advantage with the generalization in considering the finger for authentication is that if anyone has a damaged thumb, then we can authenticate using the other finger. Also the performance and efficiency of any algorithm are absolutely independent of the choice of the finger. For the database, right-hand thumb images have been taken, but in case of any problem some other finger can also be considered. Some of the important issues of preparation of fingerprint image databases are discussed in the literature (Maltoni, Maio, Jain, & Prabhakar, 2003; Vatsa et al., 2003). The important issues are as follows.

1.Resolution of image: The resolution should be good enough to detect the basic features, i.e., loops, whorls, and arches in the fingerprint.

2.The part of thumb in fingerprint image: The fingerprint image must contain the basic features like whorls, arches, etc.

3.The orientation of fingerprint: The orientation is taken with the help of ridge flow direction. Because of the nature of fingerprint features, orientation also matters and thus is considered as a feature for recognition.

Iris Database

Iris recognition has been an active research topic in recent years due to its high accuracy. The only database available in the public domain is CASIA Iris Database. To prepare

			Biometric Databases
Table 2.

	Fingerprint		Specifications				WWW Address
	Database		Specifications				WWW Address
	Database
			4 databases, each database consists of
			fingerprints from 110 individuals and 8				http://bias.csr.unib
			images each. Images were taken at different				http://bias.csr.unib
	FVC 2000		images each. Images were taken at different				o.it/fvc2000/databa
	FVC 2000		resolutions by different scanners at 500 dpi.				o.it/fvc2000/databa
			resolutions by different scanners at 500 dpi.				ses.asp
			One database (DB4) contains synthetically				ses.asp
			One database (DB4) contains synthetically
			generated fingerprints also.
			4 databases, each consisting of 110				http://bias.csr.unib
			individuals with 8 images per individual at				http://bias.csr.unib
	FVC 2002		individuals with 8 images per individual at				o.it/fvc2002/databa
	FVC 2002		500 dpi. One database (DB4) contains				o.it/fvc2002/databa
			500 dpi. One database (DB4) contains				ses.asp
			synthetically generated fingerprints also.				ses.asp
			synthetically generated fingerprints also.
	NIST 4		Consists of 2,000 fingerprint pairs (400 image				http://www.nist.go
	NIST 4		pairs from 5 classes) at 500 dpi.				v/srd/nistsd4.htm
			pairs from 5 classes) at 500 dpi.				v/srd/nistsd4.htm
			Consists of 5 volumes, each has 270 mated				http://www.nist.go
	NIST 9		card pairs of fingerprints (total of 5,400				http://www.nist.go
	NIST 9		card pairs of fingerprints (total of 5,400				v/srd/nistsd9.htm
			images). Each image is 832 by 768 at 500 dpi.				v/srd/nistsd9.htm
			images). Each image is 832 by 768 at 500 dpi.
			Consists of fingerprint patterns that have a
			low natural frequency of occurrence and				http://www.nist.go
	NIST 10		transitional fingerprint classes. There are				http://www.nist.go
	NIST 10		transitional fingerprint classes. There are				v/srd/nistsd10.htm
			5,520 images of size 832 by 768 pixels at 500				v/srd/nistsd10.htm
			5,520 images of size 832 by 768 pixels at 500
			dpi.
			27,000 fingerprint pairs with 832 by 768				http://www.nist.go
	NIST 14		pixels at 500 dpi. Images show a natural				http://www.nist.go
	NIST 14		pixels at 500 dpi. Images show a natural				v/srd/nistsd14.htm
			distribution of fingerprint classes.				v/srd/nistsd14.htm
			distribution of fingerprint classes.
			There are two components. First component
			consists of MPEG-2 compressed video of
			fingerprint images acquired with intentionally				http://www.nist.go
	NIST 24		distorted impressions; each frame is 720 by				http://www.nist.go
	NIST 24		distorted impressions; each frame is 720 by				v/srd/nistsd24.htm
			480 pixels. Second component deals with				v/srd/nistsd24.htm
			480 pixels. Second component deals with
			rotation; it consists of sequences of all 10
			fingers of 10 subjects.
			Consists of 258 latent fingerprints from crime
	NIST 27		scenes and their matching rolled and latent				http://www.nist.go
	NIST 27		fingerprint mates with the minutiae marked on				v/srd/nistsd27.htm
			fingerprint mates with the minutiae marked on				v/srd/nistsd27.htm
			both the latent and rolled fingerprints.
			Consists of 4,320 rolled fingers from 216
			individuals (all 10 fingers) scanned at 500 dpi				http://www.nist.go
	NIST 29		and scanned using WSQ. Two impressions of				http://www.nist.go
	NIST 29		and scanned using WSQ. Two impressions of				v/srd/nistsd29.htm
			each finger are available in addition to the				v/srd/nistsd29.htm
			each finger are available in addition to the
			plain finger images.
			Consists of 36 ten-print paired cards with both				http://www.nist.go
	NIST 30		the rolled and plain images scanned at 500 dpi				http://www.nist.go
	NIST 30		the rolled and plain images scanned at 500 dpi				v/srd/nistsd30.htm
			and 1,500 dpi.				v/srd/nistsd30.htm
			and 1,500 dpi.
Table 3.

	Iris		Specification		WWW Address
	Database		Specification		WWW Address
	Database
		It includes 756 iris images from 108
	CASIA		classes. For each eye, 7 images are	http://www.sinobiometrics.com/r
	CASIA		captured in two sessions, where 3	http://www.sinobiometrics.com/r
	Database		captured in two sessions, where 3			esources.htm
	Database		samples are collected in the first			esources.htm
			samples are collected in the first
			session and 4in the second session.

a database of iris images, it is required to have a setup consisting of a camera and lighting system. The details of the setup can be found in Daugman (1998) and Wildes (1999).

Speaker Recognition Databases

Speaker recognition is finding the peculiar characters that the speaker possesses and reveals in his speech. The issues involved in designing a database for speaker recognition are the type of speech (text dependent or text independent), duration of the acquired speech, and acoustic environment (Campbell & Reynolds, 1999).

TEAM LinG

Biometric Databases

Table 4.

Speaker
Recognition	Specifications	WWW Address
Database
	Consists of 2,728 five-minute
	conversations with 640 speakers. For	http://www.nist.gov/speech/tes
NIST	every call by the same person, different	http://www.nist.gov/speech/tes
NIST	every call by the same person, different	ts/spk/index.htm
	handsets were used; whereas, the	ts/spk/index.htm
	handsets were used; whereas, the
	receiver always had the same handset.
TIMIT	It consists of 630 speakers, where the	http://www.ldc.upenn.edu/Cat
TIMIT	subjects read 10 sentences each.	alog/LDC93S1.html
	subjects read 10 sentences each.	alog/LDC93S1.html
	The original TIMIT recordings were	http://www.ldc.upenn.edu/Cat
NTIMIT	transmitted over long-distance phone	http://www.ldc.upenn.edu/Cat
NTIMIT	transmitted over long-distance phone	alog/LDC93S2.html
	lines and redigitized.	alog/LDC93S2.html
	lines and redigitized.
	It consists of 128 subjects speaking the
	prompted digit phrases into high
YOHO	quality handsets. Four enrollment	http://www.ldc.upenn.edu/Cat
Database	sessions were held, followed by 10	alog/LDC94S16.html
	verification sessions at an interval of
	several days.
	It consists of telephone speech from
	about 1,084 participants. Twelve
OGI Database	sessions were conducted over a 2-year	http://cslu.cse.ogi.edu/corpora/
OGI Database	period, with different noise conditions	spkrec/
	period, with different noise conditions	spkrec/
	and using different handsets and
	telephone environments.
	It is an Italian speaker database
	consisting of 2,000 speakers. The
ELRA—	recordings were made in a house or	http://www.elda.fr/catalogue/e
SIVA	office environment over several	http://www.elda.fr/catalogue/e
SIVA	office environment over several	n/speech/S0028.html
Database	sessions. It contains the data of	n/speech/S0028.html
Database	sessions. It contains the data of
	imposters also.

	The speakers are comprised of native
ELRA—	and non-native speakers of French	http://www.elda.fr/catalogue/e
PolyVar	(mainly from Switzerland). It consists	http://www.elda.fr/catalogue/e
PolyVar	(mainly from Switzerland). It consists	n/speech/S0046.html
Database	of 143 Swiss-German and French	n/speech/S0046.html
Database	of 143 Swiss-German and French
	speakers with 160 hours of speech.
	The database contains 1,285 calls
ELRA—	(around 10 sessions per speaker)	http://www.elda.fr/catalogue/e
PolyCost	recorded by 133 speakers. Data was	http://www.elda.fr/catalogue/e
PolyCost	recorded by 133 speakers. Data was	n/speech/S0042.html
Database	collected over ISDN phone lines with	n/speech/S0042.html
Database	collected over ISDN phone lines with
	foreigners speaking English.

Table 5.

Online
Handwriting	Specifications	WWW Address
Database
UPINEN	It consists of over 5 million	http://unipen.nici.kun.nl
Database	characters from 2,200 writers.	http://unipen.nici.kun.nl
Database	characters from 2,200 writers.
	It consists of lines of text by
	around 200 writers. Total
CEDAR	number of words in the
CEDAR	database is 105,573. The	http://www.cedar.buffalo.ed
Handwriting	database is 105,573. The	http://www.cedar.buffalo.ed
Handwriting	database contains both cursive	u/handwriting
Database	database contains both cursive	u/handwriting
Database	and printed writing, as well as
	and printed writing, as well as
	some writing which is a
	mixture of cursive and printed.

Table 6.
Table 6.			B

Gait	Specifications	WWW Address
Gait
Database
Database
	It consists of image sequences
	captured in an outdoor
	environment with natural light by a
	surveillance camera (Panasonic
	NV-DX100EN digital video
NLPR Gait	camera). The database contains 20	http://www.sinobiometrics.co
NLPR Gait	different subjects and three views,	http://www.sinobiometrics.co
Database	different subjects and three views,	m/gait.htm
Database	namely, lateral view, frontal view,	m/gait.htm
	namely, lateral view, frontal view,
	and oblique view with respect to
	the image plane. The captured 24-
	bit full color images of 240
	sequences have a resolution of 352
	by 240.
	It contains a data set over 4 days,
	the persons walking in an elliptical
USF Gait	path at an axis of (minor ~= 5m
USF Gait	and major ~= 15m). The variations	http://figment.csee.usf.edu/Gai
Baseline	and major ~= 15m). The variations	http://figment.csee.usf.edu/Gai
Baseline	considered are different	tBaseline/
Database	considered are different	tBaseline/
Database	viewpoints, shoe types, carrying
	viewpoints, shoe types, carrying
	conditions, surface types, and even
	some at 2 differing time instants.
	It consists of 25 subjects walking
	with 4 different styles of walking
CMU	(slow walk, fast walk, inclined	http://hid.ri.cmu.edu/HidEval/i
Mobo	walk, and slow walk holding a	http://hid.ri.cmu.edu/HidEval/i
Mobo	walk, and slow walk holding a	ndex.html
Database	ball). The sequences are captured	ndex.html
Database	ball). The sequences are captured
	from 6 synchronized cameras at 30
	frames per second for 11 seconds.
	Large Database—The variations
	captured are: indoor track, normal
	camera, subject moving towards
	the left; indoor track, normal
	camera, subject moving towards
	the right; treadmill, normal
	camera; indoor track, oblique
Southampto	camera; and outdoor track, normal
Southampto	camera.
n Human	camera.
n Human		http://www.gait.ecs.soton.ac.u
ID at a		http://www.gait.ecs.soton.ac.u
ID at a	Small Database—The variations	k/database/
Distance	Small Database—The variations	k/database/
Distance	captured are: normal camera,
Database	captured are: normal camera,
Database	subject wearing flip-flops and
	subject wearing flip-flops and
	moving towards the left; normal
	camera, subject wearing flip-flops
	and moving towards the right;
	normal camera, subject wearing a
	trench coat; and normal camera,
	subject carrying a barrel bag on
	shoulder.

walking sequences. Details of preparation of gait databases can be found in Nixon, Carter, Cunado, Huang, and Stevenage (1999). The various gait databases available in the public domain are listed in Table 6.

Online Handwriting Recognition

Online handwriting is recognizing the handwriting obtained from a digitized pad which provides the trajectory of the pen even when the pen is not in contact with the pad. There are only a few databases only in the public domain.

Gait Recognition

Gait recognition is identifying the person from the way of walking. The walking style is analyzed from captured

CONCLUSION

Biometric databases should thus be prepared with a proper sample for enrollment and for verification as well so that the system accuracy can be increased with the decrease in cost of the system. For the other biometric traits, public databases still need to be created on some well-defined user-accepted policies so that there should be a fair evaluation of the algorithms developed by various researchers. The traits which need to be explored for

TEAM LinG

database creation are ear, hand geometry, palm print, keystroke, DNA, thermal imagery, etc.

REFERENCES

Burge. M.. & Burger, W. (2000). Ear biometrics for machine vision. In Proceedings of the International Conference on Pattern Recognition (pp. 826-830).

Campbell, J. P., & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 829-832).

Daugman, J. (1998). Recognizing persons by their iris patterns. In A. Jain, R. Bolle, & S. Pankati (Eds.), Biometric: Personal identification in networked society (pp. 103-121). Amsterdam: Kluwer.

Federal Bureau of Investigation. (1984). The science of fingerprints: Classification and uses. Washington, D.C.: U.S. Government Printing Office.

Hong, L., Wan, Y., & Jain, A. K. (1998). Fingerprint image enhancement: Algorithms and performance evaluation. In

Proceedings of the IEEE Computer Society Workshop on Empirical Evaluation Techniques in Computer Vision

(pp. 117-134).

Maltoni, D., Maio, D., Jain, A. K., & Prabhakar, S. (2003). Handbook of fingerprint recognition. Springer.

Moghaddam, B., & Pentland, A. (1999). Bayesian image retrieval in biometric databases. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (Vol. 2, p. 610).

Monrose, F., & Rubin, A. D. (2000). Keystroke dynamics as a biometric for authentication. Future Generation Computer Systems.

Nixon, M. S., Carter, J. N., Cunado, D., Huang, P. S., & Stevenage, S. V. (1999). Automatic gait recognition. In

Biometrics: Personal identification in networked society (pp. 231-249). Kluwer.

Ratha, N., Chen, S., Karu, K., & Jain, A. K. (1996). A realtime matching system for large fingerprint databases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 799-813.

Vatsa, M., Singh, R., & Gupta, P. (2003). Image database for biometrics personal authentication system. In Proceedings of CVPRIP.

Biometric Databases

Wildes, P. R. (1999). Iris recognition: An emerging biometric technology. Proceedings of the IEEE, 85(9), 13481363.

Zhao, W., Chellappa, R., Rosenfeld, A., & Philips, P. J. (2000). Face recognition: A literature survey (UMD Tech. Rep.).

KEY TERMS

Biometric: A physiological or behavioral characteristic used to recognize the claimed identity of any user. The technique used for measuring the characteristic and comparing it is known as biometrics.

Biometric Sample: The unprocessed image or physical or behavioral characteristic captured to generate the biometric template.

Biometric Template: This is the mathematical representation of the biometric sample which is finally used for matching. The size of a template varies from 9 bytes for hand geometry to 256 bytes for iris recognition to thousands of bytes for face.

Enrollment: The initial process of collecting biometric data from a user and then storing it in a template for later comparison. There are two types: positive enrollment and negative enrollment.

Failure to Enroll Rate: Failure to enroll rate is the condition which arises if the biometric sample captured is not of proper quality that a template can be generated from it.

False Acceptance Rate: The probability of incorrectly identifying any impostor against a valid user’s biometric template.

False Rejection Rate: The probability of incorrectly rejecting the valid users or failing to verify the legitimate claimed identity of any user.

Verification: Comparing the two biometric templates (enrollment template and verification template) and giving the validity of the claimed identity as to whether both are from the same person or not. This is a one-to-one comparison.

TEAM LinG

	47
Business Rules in Databases
Business Rules in Databases		B
		B

Antonio Badia

University of Louisville, USA

INTRODUCTION

Though informal, the concept of business rule is very important to the modeling and definition of information systems. Business rules are used to express many different aspects of the representation, manipulation and processing of data (Paton, 1999). However, perhaps due to its informal nature, business rules have been the subject of a limited body of research in academia. There is little agreement on the exact definition of business rule, on how to capture business rules in requirements specification (the most common conceptual models, entityrelationship and UML, have no proviso for capturing business rules), and, if captured at all, on how to express rules in database systems. Usually, business rules are implemented as triggers in relational databases. However, the concept of business rule is more versatile and may require the use of other tools.

In this article, I give an intuitive definition of business rule and discuss several classifications proposed in the literature. I then argue that there is no adequate framework to capture and implement business rules and show some of the issues involved, the options available, and propose a methodology to help clarify the process. This article assumes basic knowledge of the relational data model (Date, 2000).

BACKGROUND

The term business rule is an informal, intuitive one. It is used with different (but overlapping) meanings in different contexts. GUIDE (2000) defined business rule as follows: “A business rule is a statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business.” Thus, the rule expresses a policy (explicitly stated or implicitly followed) about some aspect of how the organization carries out its tasks. This definition is very open and includes things outside the realm of information systems (e.g., a policy on smoking). I will not deal with this kind of rule here1; instead, I will focus only on business rules that deal with information. Intuitively, a business rule in this sense usually specifies issues concerning some particular data item (e.g., what fact it is supposed

to express, the way it is handled, its relationship to other data items). In any case, the business rule represents information about the real world, and that information makes sense from a business point of view. That is, people are interested in “the set of rules that determine how a business operates” (GUIDE, 2000). Note that while data models give the structure of data, business rules are sometimes used to tell how the data can and should be used. That is, the data model tends to be static, and rules tend to be dynamic.

Business rules are very versatile; they allow expression of many different types of actions to be taken. Because some types of actions are very frequent and make sense in many environments, they are routinely expressed in rules. Among them are:

•enforcement of policies,

•checks of constraints (on data values, on effect of actions), and

•data manipulation.

Note: When used to manipulate data, rules can be used for several purposes: (a) standardizing (transforming the implementation of a domain to a standard, predefined layout); (b) specifying how to identify or delete duplicates (especially for complex domains); (c) specifying how to make complex objects behave when parts are created–deleted–updated; (d) specifying the lifespan of data (rules can help discard old data automatically when certain conditions are met); and (e) consolidating data (putting together data from different sources to create a unified, global, and consistent view of the data).

•time-dependent rules (i.e., rules that capture the lifespan, rate of change, freshness requirements and expiration of data).

Clearly, business rules are a powerful concept that should be used when designing an information system.

Types of Business Rules

Von Halle and Fleming (1989) analyzed business rules in the framework of relational database design. They distinguished three types of rules:

TEAM LinG

1.Domain rules: Rules that constrain the values that an attribute may take. Note that domain rules, although seemingly trivial, are extremely important. In particular, domain rules allow

•verification that values for an attribute make “business sense”;

•decision about whether two occurrences of the same value in different attributes denote the same real-world value;

•decision about whether comparison of two values makes sense for joins and other operations. For instance, EMPLOYEE.AGE and PAYROLL.CLERK-NUMBER may both be integer but they are unrelated. However, EMPLOYEE.HIRE-DATEandPAYROLL.DATE- PAID are both dates and can be compared to make sure employees do not receive checks prior to their hire date.

2.Integrity rules: Rules that bind together the existence or nonexistence of values in attributes. This comes down to the normal referential integrity of the relational model and specification of primary keys and foreign keys.

Business Rules in Databases

a.Definition of business terms: The terms appearing on a rule are common terms and business terms. The most important difference between the two is that a business term can be defined using facts and other terms, and the common term can be considered understood. Hence, a business rule captures the business term’s specific meaning of a context.

b.Facts relating terms to each other: Facts are relationships among terms; some of the relationships among terms expressed are (a) being an attribute of, (b) being a subterm of, (c) being an aggregation of, (d) playing the role of, or (e) a having a semantic relationship. Terms may appear in more than one fact; they play a role in each fact.

2.Constraints (also called action assertions):

Constraints are used to express dynamic aspects. Because they impose constraints, they usually involve directives such as “must (not)” or “should (not).” An action assertion has an anchor object (which may be any business rule), of which it is a property. They may express conditions, integrity constraints, or authorizations.

3.Triggering operations: Rules that govern the ef- 3. Derivations: Derivations express procedures to

fects of insert, delete, update operations in different attributes or different entities.

This division is tightly tied to the relational data model, which limits its utility in a general setting. Another, more general way to classify business rules is the following, from GUIDE (2000):

1.Structural assertions: Something relevant to the business exists or is in relation to something else. Usually expressed by terms in a language, they refer to facts (things that are known, or how known things relate to each other). A term is a phrase that references a specific business concept. Business terms are independent of representation; however, the same term may have different meanings in different contexts (e.g., different industries, organizations, line of business). Facts, on the other hand, state relationships between concepts or attributes (properties) of a concept (attribute facts), specify that an object falls within the scope of another concept (generalization facts), or describe an interaction between concepts (participation facts). Facts should be built from the available terms. There are two main types of structural assertions:

compute or infer new facts from known facts. They can express either a mathematical calculation or a logical derivation.

There are other classifications of business rules (Ross, 1994), but the ones shown give a good idea of the general flavor of such classifications.

Business Rules and Data Models

There are two main problems in capturing and expressing business rules in database systems. First, most conceptual models do not have means to capture many of them. Second, even if they are somehow captured and given to a database designer, most data models are simply not able to handle business rules. Both problems are discussed in the following sections. As a background, I discuss how data models could represent the information in business rules. Focusing on the relational model, it is clear that domain constraints and referential integrity constraints can be expressed in the model. However, other rules (expressing actions, etc.) are highly problematic. The tools that the relational model offers for the task are checks, assertions, and triggers.

TEAM LinG

Business Rules in Databases

Checks are constraints over a single attribute on a table. Checks are written by giving a condition, which is similar to a condition in a WHERE clause of an SQL query. As such, the mechanism is very powerful, as the condition can contain subqueries and therefore mention other attributes in the same or different tables. However, not all systems fully implement the CHECK mechanism; many limit the condition to refer only to the table in which the attribute resides. Checks are normally used in the context of domains (Melton and Simon, 2002); in this context, a check is used to constrain the values of an attribute. For instance,

CREATE DOMAIN age int CHECK (Value >= 18 and Value <= 65)

creates a domain named age, expressed as an integer; only values between 18 and 65 are allowed. Checks can be used inside a CREATE TABLE statement, too. Checks are tested by the system when a tuple is inserted into a table (if the check was given within a CREATE TABLE statement, or if the check was given within a CREATE DOMAIN and the table schema uses the domain), or when a tuple is updated in the table. If the condition in the check is not satisfied, the insertion or update is rejected. However, it is very important to realize that when check conditions involve attributes in other tables, changes to those attributes do not make the system test the check; therefore, such checks can be violated. For instance, the following declaration

CREATE TABLE dept ( Dno int,

Mgrname VARCHAR(50),

Check (Mgrname NOT IN (Select name from employee where salary < 50,000))

would block insertions into table dept (or updates of existing tuples) if someone introduced in the tuple a manager that made less than 50,000. However, if someone updated the table emp to change a manager’s salary to less than 50,000, the update would not be rejected.

Assertions are constraints associated with a table as a whole or with two or more tables. Like checks, assertions use conditions similar to conditions in WHERE clauses; but the condition may now refer to several existing tables in the database. Note that, unlike checks, which are associated with domains or attributes in tables, assertions exist at the database level and are independent of any table. Assertions are checked by the system whenever there is an insertion or update in any of the tables mentioned in the condition. Any transaction must keep the assertion true; otherwise, it is rejected. This makes assertions quite powerful; unfortunately, assertions are

not fully or partially supported by many commercial systems. B

Triggers are expressions of the form E-C-A, where E is called an event, C is called a condition, and A is called an action. An event is an occurrence of a situation. An event is instantaneous and atomic (i.e., it cannot be further subdivided, and it happens at once or does not happen at all). The typical events considered by active rules are primitive for database state changes (e.g., an insert, delete, or update). A condition is a database predicate; the result of such a condition is a Boolean (TRUE/FALSE). Action is a data manipulation program, including insertions, deletions, updates, transactional commands, or external procedures. In order to allow the action to process data according to the event, most systems use the variables OLD and NEW to refer to data in the database before and after the event takes place. If a row is updated, both OLD and NEW variables have values. If a row is inserted, then only the NEW variable has a value; if a row is deleted, then only the OLD variable has a value.

As a final note, it is necessary to point out that many database developers avoid the use of triggers, if at all possible, for two main reasons: (a) most trigger systems add considerable overhead, and (b) triggers bring in issues of complexity usually associated with programming (i.e., it is hard to ensure that the right semantics are being implemented). However, triggers are very powerful tools and sometimes have no substitute to express business logic (Widom & Ceri, 1996).

CAPTURING BUSINESS RULES

Eliciting and Representing Business

Rules in Conceptual Models

Because business rules sometimes represent an implicit type of knowledge (Krogh, 2000), elicitation of business rules share problems and challenges with knowledge elicitation in general. In particular, complex managerial and organizational issues must be addressed; at the minimum, a managerial action must be taken that encourages the sharing of knowledge by human agents. I will not address elicitation issues here, as they are outside the scope of this article.

Clearly, business rules should be captured in the requirement specification phase of a software project. It is during this phase that real-world information is analyzed and modeled and the scope of the system decided (Davis, 1993). However, there is no single technique that will guarantee capturing all business rules that an organization may need. Most conceptual

TEAM LinG

<<< < Предыдущая 1 2 3 4 5 67 / 777 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.201341.83 Кб20Richards M.The BCPL reference manual.1967.pdf
#
23.08.20134.27 Mб34Richardson I.E.H.264 and MPEG-4 video compression.2003.pdf
#
23.08.2013718.38 Кб104Ridley R.Потери в обмотках вследствие эффекта близости.pdf
#
23.08.201364.93 Кб27Ritchie D.M.The development of the C language.1993.pdf
#
23.08.2013379.35 Кб14Rivard F.Smalltalk.A reflective language.pdf
#
23.08.201323.5 Mб14Rivero L.Encyclopedia of database technologies and applications.2006.pdf
#
23.08.2013672.52 Кб13Robertson G.D.A practical introduction to APL-1 & APL-2.2004.PDF
#
23.08.201325.35 Mб24Robotics and automation handbook.Ed.T.R.Kurfess.2005.pdf
#
23.08.2013397.63 Кб9Rocket serial link interconnection standard.2003.pdf
#
23.08.20132.32 Mб5Roelofs G.PNG.The definitive guide.2003.pdf
#
23.08.20131.79 Mб17Roman S.VB .NET language in a nutshell.2001.pdf