Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Assistive Technology for Visually Impaired and Blinde People_Hersh,Jonson_2008.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
12.16 Mб
Скачать

References 551

A signal distance classifier can now be constructed as follows:

1.Record a vowel /a:/ or /i:/ and select a time window of suitable length. It is recommended to select a window which has sufficient signal energy.

2.Calculate the zero-crossing rate of the recorded signal.

3.Calculate the derivative of the recorded signal using Equation 14.10.

4.Calculate the zero-crossing rate of the differentiated signal.

5.Form the feature vector describing the vowel /a:/ or /i:/ using the zerocrossing rates obtained in steps 2 and 4.

6.Repeat steps 1–5 until you have a sufficient data for both /a:/ and /i:/.

7.From this data, compute the means of the vectors for the two classes separately. These means will be used as the representatives ra and ri of the two classes, as discussed in Section 14.3.1.

8.Write an algorithm which calculates the distances d(ra, x) and d(ri, x) between a given two-dimensional feature vector x and the representatives

computed in step 7.

All these steps could be performed manually. However, there are advantages in writing a computer program to implement them. It should now be possible to classify an “unknown” sound as /a:/ or /i:/ using the decision rule given in Equation 14.5.

P.3 Project in concatenative speech synthesis.

Experiments in diphone concatenation can be used to improve understanding of the problems in the acoustic component of a TTS system. However, the preparation of a complete diphone set would require excessive effort. Therefore the tutorial “Speech Synthesis” (Hoffmann et al. 1999b) is recommended instead. It can be found at the following URL:

http://www.ias.et.tu-dresden.de/sprache/lehre/multimedia/tutorial/ index.html

One section of this tutorial offers a selection of carrier words. You should define the boundaries of the diphones contained in these words in an interactive way and combine these diphones in an arbitrary way to form new words.

References

Alvarez, Y., and Huckvale, M., 2002, The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Denver, Vol. 1, pp. 329–332

Antonacopoulos, A., 1997, Automatic reading of Braille documents, In: Bunke, H., Wang, P.S.P. (Eds.).

Handbook of Optical Character Recognition and Document Image Analysis. World Scientific Publishing Co., Singapore, pp. 703–728

Arnold, S.C., Mark, L., and Goldthwaite, J., 2000, Programming by voice, VocalProgramming. In:

Proc. The Fourth International ACM SIGCAPH Conference on Assistive Technologies (ASSETS’00), Arlington, VA, pp. 149–155

Bruynooghe, J., 2001, EuroScope: Development of a wireless portable Braille organiser for visually impaired. In: Proc. Conf. and Workshop on Assistive Technologies for Vision and Hearing Impairment (CVHI 2001), Castelvecchio Pascoli, Italy, 7 p.

552 14 Speech, Text and Braille Conversion Technology

Bunke, H., and Wang, P.S.P. (Eds.), 1997. Handbook of Optical Character Recognition and Document Image Analysis. World Scientific Publishing Co., Singapore

Chu, W.C., 2003, Speech coding algorithms. Foundation and evolution of standardized coders. WileyInterscience, Hoboken, New Jersey

Deller, J.R., Proakis, J.G., and Hansen, J.H.L., 1993. Discrete-Time Processing of Speech Signals. Macmillan, New York. Reprint: John Wiley & Sons 1999

Dutoit, T., 1997. An Introduction to Text-to-Speech Synthesis. Kluwer, Dordrecht

Eichner, M., Wolff, M., and Hoffmann, R., 2000, A unified approach for speech synthesis and speech recognition using stochastic Markov graphs. In: Proc. 6th Int. Conf. Spoken Language Processing (ICSLP), Beijing, China, Vol. 1, pp. 701–704

Eichner, M., Wolff, M., and Hoffmann, R., 2004, Voice characteristics conversion for TTS using reverse VTLN, In: Proc. Int. Conf. in Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, Vol. 1, pp. 17–20

Fant, G., 1960, Acoustic Theory of Speech Production, Mouton & Co., ‘s-Gravenhage

Fellbaum, K., 2000, Elektronische Sprachsignalverarbeitung – Stand der Technik, Zukunftsperspektiven. In: Proc. Elfte Konferenz Elektronische Sprachsignalverarbeitung, Cottbus, Germany. Studientexte zur Sprachkommunikation, Vol. 20, pp. 9–16

Flach, G., Hoffmann, R., and Rudolph, T., 2000, Eine aktuelle Evaluation kommerzieller Diktiersysteme. In: Proc. 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), Ilmenau, Germany. ITGFachbericht 161, VDE-Verlag, Berlin, pp. 51–55

Fujisaki, H., 1997, Prosody, models, and spontaneous speech, In: Y. Sagisaka et al. (Eds.), Computing Prosody. Springer, New York, pp. 27–42.

Görz, G., Rollinger, C.-R., and Schneeberger, J., 2000, Handbuch der Künstlichen Intelligenz. Third edition, Oldenbourg Verlag, München und Wien

Gray, L., 1998, Braille Transcribing. http://www.uronramp.net/~lizgray/ (Data accessed June 6, 2004) Hersh, M., and Johnson, M. (Eds.), 2003. Assistive Technology for the Hearing-impaired, Deaf and

Deafblind. Springer, London

Hess, W., 2003, Articulatory, parametric, concatenative – Three paradigms in acoustic synthesis of speech and their implications. In: Wolf, D. (Ed.), Signaltheorie und Signalverarbeitung (Festschrift for A. Lacroix). Studientexte zur Sprachkommunikation, Vol. 29, pp. 116–125

Hoffmann, R., 1998, Signalanalyse und -erkennung. Springer-Verlag, Berlin

Hoffmann, R., Hirschfeld, D., Jokisch, O., Kordon, U., Mixdorff, H., and Mehnert, D., 1999a, Evaluation of a multilingual TTS system with respect to the prosodic quality, In: Proc. Intern. Congress of Phonetic Sciences (ICPhS), San Francisco, CA, Vol. 3, pp. 2307–2310

Hoffmann, R., Ketzmerick, B., Kordon, U., and Kürbis, S., 1999b, An interactive tutorial on text-to- speech synthesis from diphones in time domain. In: Proc. 6th European Conf. on Speech Communication and Technology (EUROSPEECH), Budapest, Hungary, Vol. 2, pp. 639–642

Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U., and Koloska, U., 2003, A multilingual TTS system with less than 1 Mbyte footprint for embedded applications. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, Vol. 1, pp. 532–535

Hoffmann, R., Jokisch, O., Strecha, G., and Hirschfeld, D., 2004, Advances in speech technology for embedded systems. In: Proc. Conf. and Workshop on Assistive Technologies for Vision and Hearing Impairment (CVHI 2004), Granada, Spain, 6 p.

Kahlisch, T., 1998, Software-ergonomische Aspekte der Studierumgebung blinder Menschen. Kovac, Hamburg

Karrenberg, U., 2002. An Interactive Multimedia Introduction to Signal Processing. Springer-Verlag, Berlin

Karrenberg, U., 2004. Signale – Prozesse – Systeme. Eine multimediale und interaktive Einführung in die Signalverarbeitung. Third edition, Springer-Verlag, Berlin

Kempelen, W. von, 1791, Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine. J.W. Degen, Wien

Kordon, U., and Kürbis, S., 2004, Akustischer Schalter. Praktikumsanleitung, TU Dresden Levelt, W.J.M., 1989. Speaking: From Intention to Articulation. The MIT Press, Cambridge, Mass

Levinson, S.E., 1985, Structural Methods in Automatic Speech Recognition, In: Proceedings of the IEEE, Vol. 73, No. 11, pp. 1625–1650

References 553

Lippmann, R.P., 1997, Speech recognition by machines and humans. In: Speech Communication, Vol. 22, No. 1, pp. 1–16

Maase, J., Hirschfeld, D., Koloska, U., Westfeld, T., and Helbig, J., 2003, Towards an evaluation standard for speech control concepts in real-world scenarios. In: Proc. 8th European Conf. on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, pp. 1553–1556

Olive, J.P., Greenwood, A., and Coleman, J., 1993. Acoustics of American English Speech. A Dynamic Approach. Springer-Verlag, New York

Rabiner, L., and Juang, B.-H., 1993. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ

Roe, P.R.W. (ed.), 2001, Bridging the gap? Access to telecommunications for all people. Commission of European Communities, COST 219bis, Lausanne

Schmidt, K.O., 1932, Verfahren zur besseren Ausnutzung des Ubertragungsweges¨ . D.R.P. Nr. 594976 Schnell, M., Jokisch, O., Küstner, M., and Hoffmann, R., 2002, Text-to-speech for low-resource systems.

In: Proc. on the 5th IEEE Workshop on Multimedia Signal Processing (MMSP), St. Thomas, Virgin Islands, 4 pp.

Schutkowski, G., 1938, Optisch-elektrische Einrichtung für Blinde zum Umwandeln von Schriftzeichen in Blindenschrift. D.R.P. Nr. 657536

Schutkowski, G., 1952. Druckschriftlesemaschine für Blinde. In: Die Technik, Berlin, Vol. 7, pp. 266–270 and pp. 335–337

Shannon, C., 1949. Communication in the presence of noise, Reprinted in: Collected Works, ed. by N.J.A. Sloane and A.D. Wyner, New York: IEEE Press 1993, pp. 160–172

Sohn, G., 2004, Spracherkennung und Sprachautomatisierung boomen. Pressetext Deutschland, June 26 Stiftung Warentest, 2004, Ein eheähnliches Verhältnis. In: test, Vol. 39, No. 1, pp. 38–41.

Wahlster, W. (Ed.), 2000. Verbmobil: Foundations of Speech-to-Speech Translation. Springer Verlag, Berlin

Waibel, A., Lee, K.-F. (Eds.), 1990. Readings in Speech Recognition. Morgan Kaufmann Publishers, San Mateo, CA

Weber, G., 2004, Empowering IT for social integration. In: Proc. Conf. and Workshop on Assistive Technologies for Vision and Hearing Impairment (CVHI 2004), Granada, Spain

Werner, S., Eichner, M., Wolff, M., and Hoffmann, R., 2004, Towards spontaneous speech synthesis— Utilizing language model information in TTS. In: IEEE Trans. on Speech and Audio Processing, Vol. 12, No. 4, pp. 436–445

Weyrich, C., 2003, Industrial view on Human Computer Interaction, In: Proc. Human Computer Interaction Status Conference, Berlin, Germany. German Federal Ministry of Education and Research, pp. 13–18.

Further Reading

Books

Furui, S., 2001: Digital Speech Processing, Synthesis, and Recognition. Second edition, Marcel Dekker, New York

Holmes, J.N., 1988. Speech synthesis and Recognition. London: Chapman and Hall. German translation by G. Ruske published in 1991, Oldenbourg Verlag, München und Wien

Jurafsky, D., and Martin, J.H., 2000. Speech and Language Processing. Prentice Hall, New Jersey

Selected Periodicals in Speech Technology

IEEE Transactions on Speech and Audio Processing. A publication of the IEEE Signal Processing Society. IEEE, New York

Speech Communication. A publication of the European Association for Signal Processing (EURASIP) and of the International Speech Communication Association (ISCA). Elsevier Science B. V., Amsterdam

Computer Speech and Language. Academic Press/Elsevier, Amsterdam etc

554 14 Speech, Text and Braille Conversion Technology

International Journal of Speech Technology. Published in association with the Applied Voice Input/Output Society (AVIOS). Kluwer Academic Publishers, Norwell, MA

Leading Conferences in Speech Technology

EUROSPEECH (European Conference on Speech Communication and Technology). An Interspeech event

ICSLP (International Conference on Spoken Language Processing). An Interspeech Event ICASSP (IEEE International Conference on Acoustics, Speech, and Signal Processing) ICPhS (International Congress on Phonetic Sciences)

Resources

The following web sites give information on general aspects of speech and language technology. For further details, see the URLs cited in the text and the tables. Although the author considers the external web sites referred to in this chapter useful, he does not vouch for their accuracy.

collate.dfki.de

COLLATE – Computational Linguistics and Language Technology

 

for Real Life Applications (includes the Virtual Information Center

 

Language Technology World)

www.elsnet.org/

ELSNET – European Network of Excellence in Human Language

 

Technologies

www.hltcentral.org/

JEWELS – Joint European Website for Education in Language and

htmlengine.shtml?id=165

Speech

www.isca-speech.org

ISCA – International Speech Communication Association

www.phon.ucl.ac.uk/

UCL London, Department of Phonetics and Linguistics

www.elra.info/

ELRA – European Language Resources Association

www.ecess.org

ECESS – European Center of Excellence on Speech Synthesis

ttssamples.syntheticspeech.de/

Overview on many TTS systems including demonstrations for the

 

German language

mambo.ucsc.edu/

Perceptual Science Laboratory at the University of California

www.cslu.ogi.edu/

Center for Spoken Language Understanding at the Oregon Graduate

 

Institute of Science and Technology

db1.rehadat.de/rehadat/

International Alliance of Assistive Technology Information Pro-

alliance.jsp

viders