April 2004, Volume 13, Number 1
This paper deals with the analysis and optimization of a speech command recognition system (SCRS) trained on Czech telephone database Speechdat(E) for use in a selected noisy environment. The SCRS is based on hidden Markov models of context dependent phones (triphones) and mel-frequency cepstral coefficients analysis of speech (MFCC). The main aim is to analyze and to search for the optimal settings of SCRS with respect to additive noise robustness without use of additional techniques for additive noise reduction. The analysis is pointed to the appropriate setting of MFCC computation, the silence model adjustment and grammar selection possibilities. It is shown, that the correct performance of SCRS strictly depends on an appropriate adjustment of the silence model. The ability of the silence model adaptation is confirmed. When SNR is higher than 15 dB the suitable performance of SCRS can be guarantied without any modification of the triphones speech models by: 1. the optimal setting of MFCC computation, 2. the proper silence model adaptation. The assumption of a speech command recognition system use in an environment where SNR is higher than 15 dB is fulfilled in many applications.
- BELLEGARDA, J. R. Statistical techniques for robust ASR: reviewand perspectives. Eurospeech'97, 1997, p. 33 - 36.
- MILNER, B. P., VASEGHI, S. V. Comparison of some noisecompensationmethods for speech recognition in adverseenvironments. IEEE Proc.-Vis. Image Signal Processing, 1994.
- HERMANSKY, H., MORGAN, N. RASTA processing of speech.IEEE Trans. Speech Audio Processing, 1994, vol. 2, pp. 578-589.
- GREZL, F. Effect of normalization on TRAP based systems in ASR.In Radioelektronika, conference proceedings, 2003.
- KREISINGER, T., POLLAK, P., SOVKA, P., UHLIR, J.Experimental study of speech recognition in noisy environments.Signal Analysis and Prediction, 1998, Birkhauser, Boston.
- CERNOCKY, J., POLLAK, P., HANZL, V. Czech recordings andannotations on CD's - Documentation on the Czech Database andDatabase Access. Research Report, 2000, Prague, CTU, ED2.3.2.
- GALES, M. J. F., YOUNG, S. J. HMM recognition in noise usingparallel model combination. Eurospeech'93, 1993, pp. 837-840.
- GALES, M. J. F., YOUNG, S. J. The application of parallel modelcombination to a large vocabulary dictation task. Eurospeech'95,1995, pp. 1983-1986.
- HUNG, J., SHEN, J., LEE, L. New approaches for domaintransformation and parameter combination for improved accuracy inparallel model combination (PMC) techniques. IEEE Trans. onSpeech and Audio Processing, 2001, vol. 9, pp. 842-855.
- HILGER, F., NEY, H. Noise level normalization and referenceadaptation for robust speech recognition. Proc. ASR-2000, 2000, pp.64-68.
- VETH, J., MAUUARY, L., NOE, B., WET, F., SIENEL, J., BOVES,L., JOUVET, D. Feature vector selection to improve ASR robustnessin noisy conditions. Eurospeech'01, 2001.
- ACCAINO, S., TSIPORKOVA, E., HAMME, H. Modelling of extraevents for telephony. In workshop proceedings Voice operatedtelecom services: Do they have a bright future?. Ghent, Belgium,2000, pp. 75-78.
- EALEY, D., KELLEHER, H., PEARCE, D. Harmonic tunnelling:tracking non-stationary noises during speech. Eurospeech'01, 2001.
- ANDRASSY, B., VLAJ, D., BEAUGEANT, CH. Recognitionperformance of the Siemens front-end with and without framedropping on the Aurora 2 database, Eurospeech'01, 2001.
- YOUNG, S. The HTK Book (for HTK Version 3.1), CambridgeUniversity Engineering Department, 2001.
This article deals with the comparison of various estimators of the m parameter from the Nakagami distribution. This kind of distribution has been used in many engineering applications and we present another possible application in biomedical engineering, particularly the ultrasound tissue characterization in the echocardiographic application. Matlab 6.5 was used as a proper tool for fast and efficient scientific research.
- ABDI, A., KAVEH, M. Performance comparison of three different estimators for the Nakagami-m parameter using Monte Carlo Simulation. Technical Report.
- CHENG, J., BEAULIE, N. C. Maximum-Likelihood Based Estimation of the Nakagami m Parameter. IEEE Communications Letters, 2001, vol. 5, no. 3, p.101-103.
- CHENG, J., BEAULIE, N. C. Generalized Moment Estimators for the Nakagami Fading Parameter. IEEE Communications Letters, 2002, vol. 6, no. 4, pp.144-146.
- JAN, J. Digital Signal Filtering, Analysis and Restoration. IEE Telecomunication Serie 44, 2000.
- SHANKAR, P. M. Ultrasonic Tissue Characterization Using a Generalized Nakagami Model. IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, 2001, vol. 48, no. 6, pp. 1716-1720.
- SHANKAR, P. M. A General Statistical Model for Ultrasonic Backscattering from Tissues. IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, 2000, vol. 47, no. 3, p. 727-736.
- SHANKAR, P. M. A Compound Scattering pdf for the Ultrasonic Echo Envelope and Its Relationship to K and Nakagami Distributions. IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, 2003, vol. 50, no. 3, pp. 339-343.
- SMOLIKOVA, R., WACHOWIAK, M. P., ZURADA J. M., ELMAGHRABY, A. S. Neural Network Modeling of Ultrasound Speckle. Technical Report, University of Lousville, 2002.
- SMOLIKOVA, R. WACHOWIAK, M. P., ZURADA J. M., ELMAGHRABY, A.S. Segmentation of Ultrasound Images with Speckle Modeling. In Int. Conf. BIOSIGNAL, 2002, p.316-319.
- ZHANG, Q.T. A Note on the Estimation of Nakagami-m Fading Parameter. IEEE Communications Letters, 2002, vol. 6, no. 6, pp. 237-238.
- Zapletal, J. Fundamentals of probability and statistics. VUT Brno, 1995 (in Czech).
An improvement of Watson's DVQ (Digital Video Quality) metrics is introduced. This metric was chosen for its easy implementation caused by using DCT (Discrete Cosine Transform) for video decomposition into spatial channels. The metric is upgraded by segmentation tool. This tool is used for weighting the masked differences.
- WATSON, A. B. Toward a perceptual video quality metric. In Proc. SPIE. San Jose, CA, 1998, vol. 3299, p. 139-147.
- PETERSON, H., AHUMADA, A. J., WATSON, A. An Improved Detection Model for DCT Coefficient Quantization. In Proc. SPIE. 1993, vol. 1913, p. 191-201.
- EGGER, O., LI, W., KUNT, M. High Compression Image Coding Using an Adaptive Morphological Subband Decomposition. In Proceedings of the IEEE, Special Issue on Advances in Image and Video Compression, 1995, vol. 83, no. 2, p. 272-287.
- van den BRANDEN LAMBRECHT, Ch. J. Perceptual Models and Architectures for Video Coding Applications. Ph.D. Thesis, Ecole Polytechnique Federale de Lousanne, Lausanne, EPFL, 1996.
- NADENAU, M. J., WINKLER, S. et al. Human Vision Models for Perceptually Optimized Image Processing - A Review. In Proc. of the IEEE. 2000.
- WINKLER, S. Vision Models and Quality Metrics for Image Processing Applications. Ph.D. Thesis, Ecole Polytechnique Federale de Lousanne, Lausanne, EPFL, 2000.
- WATSON, A. B. DCT quantization matrices visually optimized for individual images. In Proc. SPIE. 1993, vol. 1913, p. 202-216.
- WATSON, A. B. et al. Design and performance of a digital video quality metric. In Proc. SPIE. San Jose, CA, 1999, vol. 3644, p. 168-174.
- FREDERICKSEN, R. E., HESS, R. F. Estimating multiple temporal mechanisms in human vision. In Vision Research. 1998, vol. 7, no. 38, p. 1023-1040.
- CCIR. Method for the Subjective Assessment of the Quality of Television Pictures. In 13th Plenary Assembly, Recommendation 500. 1974, vol. 11, p. 65-68.
- COMES, S., MACQ, B. Human Vision Quality Criterion. In SPIE Visual Communications and Image Processing. 1990, vol .1360, p. 2-7.
This contribution describes the programme for one part of the automatic Text-to-Speech (TTS) synthesis. Some experiments (for example ) documented the considerable improvement of the naturalness of synthetic speech, but this approach requires completing the input feature values by hand. This completing takes a lot of time for big files. We need to improve the prosody by other approaches which use only automatically classified features (input parameters). The artificial neural network (ANN) approach is used for the modeling of prosody parameters. The program package contains all modules necessary for the text and speech signal pre-processing, neural network training, sensitivity analysis, result processing and a module for the creation of the input data protocol for Czech speech synthesizer ARTIC .
- MATOUSEK, J., PSUTKA, J., KRUTA, J. Design of Speech Corpus for Text-to-Speech Synthesis. In Proc. of. Eurospeech2001. Denmark (Alborg), 2001, vol. 3, pp. 2047-2050.
- TUCKOVA, J., SEBESTA, V. Influence of Language Parameters Selection on the Coarticulation of the Phonemes for Prosody Training in TTS by Neural Networks. In Proc. of the Int. Conf. on Artificial Neural Nets and Genetic Algorithms (ICANNGA 2003). France (Roanne), 2003, pp.85-90, ISBN: 3-211-00743-1 ,Springer-Verlag Wien-New York.
- TUCKOVA, J. Uvod do teorie a aplikaci umelych neuronovych siti. Skripta FEL CVUT v Praze, vydavatelstvi CVUT, 2003, ISBN 80-01-02800-3.
- TUCKOVA, J., MATOUSEK, J. Czech Language Features Selection and Prosody Modelling for Text-to-Speech Synthesis. In ECMS 2003. Czech Republic (Liberec), 2003, vol. 1, p. 98-102. ISBN 80-7083-708-X.
- SANTARIUS, J. PPL - Nastroj pro detekci zakladniho hlasivkoveho tonu. In Moderni smery vyuky elektrotechniky a elektroniky. Brno, 2002, vol. 1, p. 70-73. ISBN 80-214-2190-8.
- SANTARIUS, J. Statistical Methods for Optimalization of Neural Nets. In Proc. of Workshop 2002. Prague: CTU, 2002, vol. A, p. 486-487. ISBN 80-01-02511-X.
- SANTARIUS, J. Special Algorithms Used in Prosody Modelling. In Proc. of the Polish-Hungarian-Czech Workshop on Circuit Theory, Signal Processing, and Applications. Prague, 2003, vol. 1, p. 43-48. ISBN 80-01-02825-9.
- SANTARIUS, J., TIHELKA, J. Prosody Modelling of Synthetic Speech. In ECMS 2003. Czech Republic (Liberec), 2003, vol. 1, p. 89-92. ISBN 80-7083-708-X.
- SLEZAK, J. Automatic Phonetic Transcription. In ECMS 2003. Czech Republic (Liberec), 2003, vol. 1, p. 93-97. ISBN 80-7083-708-X.
- PSUTKA, J. Speech communication with a computer (in Czech-Komunikace s pocitacem mluvenou reci). Academia, Praha, 1995, ISBN 80-200-0203-0.
- TRABER, Ch. The implementation of the Text-to-Speech System for German. PhD dissertation, ETH Zurich, Switzerland, 1995.
- PALKOVA, Z Phonetics and phonologics of the Czech language (in Czech: Fonetika a fonologie cestiny). Univerzita Karlova-Praha, 1994, ISBN: 80-7066-843-1.
- DEMUTH, H., BEALE, M. Neural Network Toolbox. For Use with MATLAB. User's Guide, ver.4, The MathWorks, Inc., MA 01760-2098
- TUCKOVA,J., SEBESTA,V. Data Mining Approach for Prosody Modelling by ANN in Text-to-Speech Synthesis.In Proc. of the Int. Conf. IAESTED AIA2001. Spain (Marbella), 2001, pp. 161-166, ISBN:0-88986-301-6.
- GEHER, K. Theory of Network Tolerances. Akademiai Kiado, Budapest, 1971.
In recent years, an access to multimedia data has become much easier due to the rapid growth of the Internet. While this is usually considered an improvement of everyday life, it also makes unauthorized copying and distributing of multimedia data much easier, therefore presenting a challenge in the field of copyright protection. Digital watermarking, which is inserting copyright information into the data, has been proposed to solve the problem. In this paper two original watermarking schemes based on DCT transformation for ownership verification and authentication of color images were proposed. Some color models in process of watermarks embedding and extracting are described too.
- LEVICKY, D., HOVANCAK, R., KLENOVICOVA, Z. Digital watermarking. Principles, systems and applications. (in Slovak) Slaboproudy obzor, Czech Republic, p. 1-5.
- HOVANCAK, R. DCT Watermarking Algorithm without Using Original Image for Extraction. In II. PhD conference and SVOS, Kosice, 2002, p. 33-34.
- HOVANCAK, R., LEVICKY, D. Comparison of watermarking methods using DCT transformation. RADIOELEKTRONIKA 2003, 13th International Czech - Slovak Scientific Conference, Brno, Czech Republic, 2003, p. 403-406.
- PETITCOLAS, F. A. P., ANDERSON, R. J., KUHN, M. G. Attacks on copyright marking systems. In Proceedings of the Second International Workshop on Information Hiding. Portland, 1998, p. 218-238.
This paper deals with 3D motion estimation of the wire frame head model on the basis of the analysis of the parameters of 3D global motion of the real human head for each frame of videosequence. The proposed algorithm of 3D global motion estimation is given by solution of 6 linear equations for three extracted feature points of the real human head in each frame. Next there is presented an algorithm of texturing of 3D wire frame model of human head after its estimated global motion. Texturing is carried out by two dimensional affine transform directly in synthesized frames. Both proposed algorithms can achieve very low bit rate in model based image coding.
- MIHALIK, J. Image Coding in Videocommunications. Mercury-Smekal, ISBN 80-89061-47-8, Kosice, 2001.(In Slovak)
- MIHALIK, J. Adaptive Hybrid Coding of Images. Journal of Electrical Engineering, 1993, vol. 44, No.3, p.85-89.
- FORCHHEIMER, R., KROMANDER, T. Image Coding - from Waveforms to Animation. IEEE Trans. Acoust., Speech and Signal Proc. 1989, vol.ASSP-37, no.12, p.2008-2023.
- AIZAWA, K., HUANG, T. S. Model-Based Image Coding: Advanced Video Coding Techniques for Very Low Bit-Rate Applications. Proc. IEEE, 1995, vol.83, no.2, p.259-271.
- PEARSON, D. E. Development in Model-Based Video Coding. Proc. IEEE, 1995, vol.83, no.6, p.892-906.
- MUSMANN, H. G., HATTER, M., OSTERMAN, J. Object-Oriented Analysis-Synthesis Coding of Moving Images. Signal Processing: Image Communication, 1989, vol.1, no. 2, p. 117-139.
- WELSH, W. J. Model-Based Video Coding of Videophone Images. Electronics & Commun. Engineering J., 1991, p.29-36.
- MIHALIK, J. Adaptive Transform Coding of Image. Electronic Horizon, 1991, vol. 52, no.11-12, p.253-257.
- MIHALIK, J., GLADISOVA, I., MICHALCIN, V. Two Layer Vector Quantization of Images. Radioengineering, 2001, vol.10, no.2, p.15-19.
- RYDFALK, M.: CANDIDE: A Parameterised Face. Dep. Elec. Eng. Rep. LiTH-ISY-I-0866, Linkoping Univ., 1987.
- TSAI, C. J., EISERT, P., GIROD, B., KATSAGGELOS, A. K. Model-Based Synthetic View Generation from a Monocular Video Sequence. In Int. Conf. on Image Proc. Santa Barbara, 1997, vol.1, p.444-447.
- ZHANG, L. Estimation of Eye and Mouth Corner Point Position in a Knowledge-Based Coding System. Proc. SPIE, 1996, vol.2952, p.21-28.
- FOLEY, J. D, VAN DAM, A., FEINER, S. K, HUGHES, J. F. Computer Graphics, Principles and Practicles. Addison-Wesley, 2nd edition, 1990.
- ANTOSZCZYSZYN, P. M., HANNAH, J. M., GRANT, P. M. A New Approach to Wire-Frame Tracking for Semantic Model-Based Coding Moving Image Coding. Signal Processing: Image Com-munication, 2000, vol.15, p.567-580.
The standard approximation algorithms are well described in the literature, but some equiripple approximations are described with some deficiencies. Especially Chebyshev and inverse Chebyshev approximations are often wrongly interpreted or implemented. In this paper, we propose all formulas for computing Chebyshev approximations in a standard form. Transformations, which are necessary for circuit implementation, are presented in the analytical form too.
- SAAL, R., Handbuch zum Filterentwurf. Berlin: AEG-Telefun-ken, 1979.
- DAVIDEK, V., LAIPERT, M., VLCEK, M. Analogove a cislico-ve filtry (Analog and Digital Filters). Praha: CVUT Publishing, 2000.
- WILLIAMS, A. B. Electronic Filter Design Handbook. New York: McGraw-Hill Book Company, 1981.
- WEINBERG, L. Network Analysis and Synthesis. New York: McGraw-Hill Book Company, 1962.
- ANTONIOU, A. Digital Filters: Analysis and Design. New York: McGraw-Hill Book Company, 1979.
- HERPY, A., BERKA, J. C. Active RC Filter Design. Budapest: Akademiai Kiado, 1986.
- KENDAL, L. S. Analog Filters. London: Chapman & Hall, 1997.
B. Taha-Ahmed, M. Calvo-Ramon, L. de Haro-Ariet
The Capacity and Interference Statistics of High Car Traffic W-CDMA Street Cross-Shaped Micro-Cells (Uplink Analysis),
Since interference is related to the capacity and performance of W-CDMA system, it is necessary to investigate the interference characteristics (the mean value and the variance). Thus, the uplink capacity and the interference statistics of the sectors of the cross-shaped W-CDMA microcell have been analyzed using geometry with 17 microcells. A single slope propagation model with a lognormal shadowing factor has been used in the analysis. The cells have been assumed to exist in city streets with high car traffic. The capacity and the interference statistics of the sectors have been studied for different sector ranges, and different side-lobe level. The results show that the capacity increases with the increment of the sector range and with the reduction of the side-lobe level of the antennas used.
- CHO, H. S., CHUNG, M. Y., KANG, S. H., SUNG, D. K. Perfor-mance analysis of cross-and cigar shaped urban microcells consi-dering user mobility characteristics. IEEE Transactions on Vehicular Technology. 2000, vol. 49, no. 1, p 105 - 115.
- MIN, S., BERTONI, H. L. Effect of path loss model on CDMA sys-tem design for highway microcells. In Proceedings of the 48th Vehi-cular Technology Conference. Ottawa (Canada), 1998, p 1009 to 1013.
- HASHEM, B., SOUSA, E. S. Reverse link capacity and interference statistics of a fixed-step power-controlled DS/CDMA system under slow multipath fading. IEEE Transactions on Communication. 1999, vol. 47, no. 12, p. 1905 - 1912.
- AHMED, B. T., RAMON, M. C., ARIET, L. H. Capacity and inter-ference statistics of highways W-CDMA cigar-shaped microcells (uplink analysis). IEEE Communication Letters. 2002, vol. 6, no. 5, p. 172 - 174.
- AHMED, B. T., RAMON, M. C., ARIET, L. H. Capacity and inter-ference statistics of street W-CDMA cross-shaped micro cells (up-link analysis). In Proceedings of TELEC 2002 Conference. Santiago de Cuba (Cuba), p. 95 - 99.
- MAUSI, H., KOBAYASHI, T., AKAIKE, M. Microwave path-loss modeling in urban line-of-sight environments. IEEE Journal on Se-lected Areas in Communication. 2002, vol. 20, no. 6, p 1151 - 1155.
- MELIS, B., ROMANO, G. UMTS W-CDMA: Evaluation of radio performance by means of link level simulations. IEEE Transactions on Personal Communications. 2000, vol. 7, no. 3, p 42 - 49.
The expected value of the signal to noise ratio of W-CDMA infostations is derived. A model of 5 cells is used to analyze the system performance. The infostations are assumed to exist in rural zones. The performance of the infostations is studied for different breakpoint distances, different infostations separation, a different number of users for each infostation and for different bit rate.
- FRENKIEL, R. H., BADRINATH, B. R., BORRAS, J., YATES, R. D. The infostations challenge: balancing cost and ubiquity in deli-vering wireless data. IEEE Transactions on Personal Communi-cations. 2000, vol. 11, no. 4, p 66 - 71.
- BORRAS, J., YATES, R. D. Cellular excess capacity for infostati-ons. In Proceedings of the Conference EUNIC 1999.
- IRVINE, J., PESCH, D., ROBERTSON, D., GIRMA, D. Efficient UMTS data service provision using infostations. In Proceedings of the Vehicular Technology Conference 1998, p. 2199 - 2123.
- TSAI, Y. R., CHANG, J. F. Feasibility of adding a personal communications network to an existing fixed-service microwave system. IEEE Transactions on Communication. 1996, vol. 44, no. 1, p 76 to 83.