Главная Коллекция "Otherreferats" Математика On the binomial method for calculation of entropy

On the binomial method for calculation of entropy

Mathematical methods of optimizing statistical research. Solution the problem of computation the entropy value of binary messages of increased length generated by Bernoulli information sources. Development of methods to reduce the number of calculations.

Рубрика	Математика
Вид	статья
Язык	английский
Дата добавления	19.09.2024
Размер файла	300,3 K

посмотреть текст работы

скачать работу можно здесь

полная информация о работе

весь список подобных работ

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://allbest.ru

Sumy State University

On the binomial method for calculation of entropy

Borysenko Oleksiy

doctor of technical sciences, professor

Ukraine

Аннотация

О биномиальном методе вычисления энтропии

В статье решается задача вычисления значения энтропии двоичных сообщений увеличенной длины, генерируемых источниками информации Бернулли. Такая проблема возникает при сжатии, кодировании с исправлением ошибок, комбинаторной оптимизации и т.д. При ее обычном решении сначала с помощью статистических тестов определяются вероятности сообщений, а затем над ними выполняются операции логарифмирования, умножения и сложения, количество которых растет экспоненциально с увеличением длины сообщения. В данной работе предлагается сократить количество вычислений за счет замены ряда операций сложения соответствующими операциями умножения. Для этого множество всех сообщений, сгенерированных источником Бернулли, разбивается на классы эквивалентности на основе равенства количества единиц, входящих в сообщения. Далее на основе правила Бернулли для сообщений каждого класса находятся вероятности их генерации. Затем вычисляются вероятности появления классов и их логарифмы. Они умножаются на вероятности классов и соответствующие им биномиальные коэффициенты. Полученное произведение суммируется с аналогичными произведениями, рассчитанными для других классов эквивалентности, что в конечном итоге приводит к желаемой энтропии. 7 в результате этих операций скорость вычисления энтропии увеличивается в несколько раз, а статистические исследования упрощаются.

Ключевые слова: энтропия, сообщение, биномиальный коэффициент, источник информации, вероятности.

Summary

The article solves the problem of calculating the entropy value of binary messages of increased length generated by Bernoulli information sources. Such a problem arises during compression, error-correcting coding, combinatorial optimization, etc. In its usual solution, first, by means of statistical tests, the probabilities of messages are found, and then operations of logarithm, multiplication, and addition are performed on them, the number of which grows exponentially with increasing message length. In this work, it is proposed to reduce the number of calculations by replacing a number of addition operations with the corresponding multiplication operations. To do this, the set of all messages generated by the Bernoulli source is divided into equivalence classes based on the equality of the number of units included in the messages. Further, based on the Bernoulli rule for messages of each class, the probabilities of their generation are found. Then the probabilities of occurrence of classes and their logarithms are calculated. They are multiplied with the class probabilities and their corresponding binomial coefficients. The resulting product is added to similar products calculated for other equivalence classes, which ultimately leads to the desired entropy. 7s a result of these operations, the entropy calculation speed is accelerated by several times, and statistical studies are simplified.

Keywords: entropy, message, binomial coefficient, source of information, probabilities.

Introduction

The basis of modern information theory is the formula for the entropy of the source of probabilistic messages proposed by Shannon in 1948:

where N is the number of messages j generated by the source A and Pj their probabilities. It is proved that the minimum value of entropy H(A*) = 0 is when one message is generated by the source of information, and the maximum value H (A*) = log ₂ N is when the probabilities of generated messages are equal.

The generation of different probabilities pj by the source A leads to a decrease in entropy H(A ) and the appearance of redundant information in its messages, the amount of which is determined by the difference log 2N -H(A*). Its value must be known when compressing information, error-correcting coding, data encryption, and other similar tasks. To solve them, it is required to find the value of entropy H(A”), having previously determined in accordance with formula (1) the probabilities Pj of generated messages. They are found by statistical studies of the arrays of generated information, which in some cases may require a large number of computational operations. In addition, the calculation of entropy H(A') by formula (1) leads to additional computational operations.

The collection of statistical data and their processing is associated with significant computational difficulties in the case when messages are represented by binary sequences of large length n. Then their number n = 2" grows exponentially. Such binary messages are encountered in the formation of binary images, television lines, drawings, and the like. They also appear in the binary encoding of texts, words, various characters, such as letters or hieroglyphs.

One of the methods to reduce the complexity of statistical tests is the use of the universal coding method. With it, the same code is chosen in such a way as to give minimal redundancy when encoding several sources of information with different, but not very different, entropies. However, although such a coding method reduces the volume of statistical research and computational operations, it does not solve the very problem of reducing their complexity, since they basically remain unchanged. The statistics of the appearance of individual messages are collected and the estimated values of the probabilities Pj are found from them and operations are performed according to the formula (1). Therefore, one should look for methods for finding entropy H(A ) that can reduce the number of statistical tests and corresponding calculations in determining it. For finding such methods, a special role is played by the type of information source, among which the most simple are Bernoulli sources, in which the probabilities P of generated ones and zeros 1 - P do not depend on the values of the previous digits. In this paper, it is proposed to develop a method for calculating its entropy H(A ) for such a source of information.

Formulation of the problem. A Bernoulli source A is given that generates binary messages of length n with the probability of occurrence of units Р and zeros 1 - Р in them. The task is to find the entropy H(A ) of the source A on the basis of these data. mathematical statistical entropy bernoulli

The solution of the problem

In [1], for a Bernoulli source of information, a combinatorial-probabilistic form of entropy was presented

where рк = pk (1- p)" к is the probability that the Bernoulli source generates a message j with the number of units k, k = 0, 1,..., n.

Accordingly, at p = 0.5

Let us prove that the entropy H(A ) of the Bernoulli source of binary messages (2) coincides with its expression given in general form in (1).

Theorem 1. Entropy of a Bernoulli source of information

Proof. From (2) it follows that the number of messages generated by the source A containing k units each is equal to the binomial coefficient СП = n!/k!(n - k)! . Since k = 0, 1,2, ..., n units, the number of such binomial coefficients is n + 1.

As you know, their sum is . This means that j = - 2, …,2ⁿ the original binary k=0 messages of the Bernoulli source are divided by binomial coefficients into n + 1 equivalence classes.

Accordingly, their sum is equal to N = 2n . The sum of the products of the probabilities Pj of these messages and their logarithms will give the entropy expression in its original form (1). Consequently,

Formula (2) implies a method for calculating the Bernoulli entropy H(A ) . At step 1, the original messages are split into n + 1 equivalence classes, the messages of which contain k = 0, 1, ..., n ones.

At the 2 step, binomial coefficients C_n^k are calculated for each class k. At the 3 step, the probabilities p of occurrences of 1 in the transmitted messages j are statistically found.

At the 4 step, the probabilities P_k = p_k (1 - p )^{n -}^k are determined using the Bernoulli formula. At the 5 step, the logarithms of the probabilities Pk are calculated for all n + 1 values of k. At the 6 step, the binomial coefficients Ck are multiplied by the probabilities Pk and their logarithms. At the 7 step, the n + 1 products obtained are added together. End.

In table. 1 shows an example of calculating the entropy H(A) of a Bernoulli source. It represents the implementation in tabular form of the above method for calculating entropy H(A) for a given probability P = 0.1 of the occurrence of one and, accordingly, the probability of 0.9 of the occurrence of 0 in 8 binary messages of length 3 bits.

Table 1

Calculation of entropy H(A ) at P = 0.1

(Author's work)

As follows from Table 1, the eight binary messages in column 2 on the left form 4 equivalence classes the first of which contains zero ones, the second one 1 one, the third one 2 ones, and the fourth one 3 ones.

Then, by the number of units k in column 3 of the table, the values of binomial coefficients C_n^k are obtained. Columns 4 and 5 define the values Pk of and log₂P_k for each value of k. Column 6 contains works -C₃^klog₂P_k . After that, they are added together, forming the entropy value. End.

The entropy of the equivalence classes for column 6 is:

Accordingly, the entropy of the Bernoulli source

The code redundancy is log₂N-H(A ) = 3-1,4 = 1,6 бит.

Note that 6 operations of adding entropy expressions are replaced by 2 operations of multiplying Pk log2 Pk products by binomial coefficients and C31 and C32. It is this replacement that gives an increase in the speed of entropy H(A*) calculation. For large lengths n, it is especially noticeable, since with increasing n the value of the binomial coefficient grows very quickly for the same value of k. So, for example, with the value of the parameters of the binomial coefficient C10o n = 100 and k = 1, one operation of multiplying the binomial coefficient by the probability P value replaces 100 addition operations, compared to using formula (1) to calculate the initial entropy H(A ).

The advantage of the above method for calculating entropy H(A*) is that it collects statistics only for the probabilities P of the occurrence of 1 in the messages of the source A , and not for estimating the probability p of each message j separately from their total number 2ⁿ of messages. In addition, the entropy H (A*)_n by the probabilities P_k , rather than summing the statistically obtained probabilities of all messages j from their number 2ⁿ , as provided for entropy in formula (1). The binomial coefficients are then calculated by one of the methods described in [2].

The disadvantage of the method is that it is calculated only on the fact that the probabilities P of the occurrence of 1 in messages are independent of the probabilities of occurrence of 0 and 1 in the previous message bits. If there is such a dependence, the calculated value of the Bernoulli entropy H (A) will be overestimated, and the value of excess information corresponding to it will be underestimated. However, such an error in the value of entropy and redundant information is quite acceptable in many cases. Therefore, a slight excess of the calculated value of the Bernoulli entropy H(A ) over its real value will not greatly distort the final result, but the entropy calculation time can be significantly reduced.

Another disadvantage of the method is that the binomial coefficients Cnk for large message lengths n can reach values that are difficult to calculate. However, the entropy H(A ) calculation time when using them is still much less than the time of adding the statistical probabilities p of messages according to formula (1). In addition, there are many methods for quickly calculating binomial coefficients that speed up their calculation. Based on the foregoing, this method of calculating the entropy H(A*) should be recommended not only for Bernoulli sources of information, but also for Markov sources as the first estimating step of their entropy.

The proposed method is designed for the value of the generated number of messages N = 2ⁿ. If it is less than N , then in some equivalence classes the binomial coefficients will decrease in value by the number of missing messages.

From this, the entropy formula does not lose its original meaning. Also, some message equivalence classes may disappear altogether, or they may not be taken into account due to their small weight -C₃^kP_k log₂P_k, brought by them to the total value of the Bernoulli entropy H(A ). This weight depends on both the probability P_k of class messages and the value of the binomial coefficient C_n^k

Conclusions

The proposed method for calculating the entropy of a Bernoulli source of information makes it possible to significantly speed up its calculation by dividing the initial set of messages into equivalence classes. Signs of equivalence classes will be the number of units contained in them. Along with this, the method reduces the time of statistical testing, since it requires a set of statistics only to estimate the probabilities of occurrence 1, and not messages in general, as required by the entropy H(A ) of the general form.

References

[1] Bedenko O. A. (1995) On the decomposition of Bernoulli sources of information. Bulletin of Sumy State University, (3), 57 - 59.

[2] Borysenko O. A. (2019). Discrete Math. Sumy: University book.

Размещено на Allbest.ru

статья "On the binomial method for calculation of entropy" скачать

Подобные документы

A Problem with a Nonlocal Boundary Condition on the Characteristic for a Class of Equations of Mixed Type
Investigation of the problem with non-local conditions on the characteristic and on the line of degeneracy . The solution of the modied Cauchy problem with initial data. The solution of singular integral equations. Calculation of the inner integral.

статья [469,4 K], добавлен 15.06.2015
Attractive mathematical induction
Review of concepts, forms and different ways of representing the methods of mathematical induction, characterization of its ideas and principles. Features of a multimedia learning object students and teachers on the example of the University of Latvia.

реферат [1,1 M], добавлен 11.02.2012
Розробка методики для отримання числових рішень диференціальних рівнянь
Огляд існуючих програмних комплексів. Особливості Finite Difference Time Domain Solution. Метод кінцевих різниць у часовій області. Граничні умови PEC симетрії і АВС. Проблема обчислення граничних полів. Прості умови поглинання. Вибір мови програмування.

курсовая работа [242,5 K], добавлен 19.05.2014
Triple-wave ensembles in a thin cylindrical shell
Research of the primitive quasi-harmonical triple-wave patterns in thin-walled cylindrical shells Tracking of distribution of a stable wave. Reception of the equations of nonlinear fluctuations in an environment according to the hypothesis of Kirhoff.

дипломная работа [221,9 K], добавлен 14.02.2010
Research methods to collect primary empirical information
Research methods are strategies or techniques to conduct a systematic research. To collect primary data four main methods are used: survey, observation, document analysis and experiment. Several problems can arise when using questionnaire. Interviewing.

реферат [16,7 K], добавлен 18.01.2009
The basic methods of psychological research
Concept of methods of research. Value of introduction of laboratory experiment and measurement in psychology. Supervision and experiment, their features. Methods of processing and interpretation of results of experiments. Rules of the conversation.

реферат [19,1 K], добавлен 31.10.2011
Methods of Lexicological Analysis
The process of scientific investigation. Contrastive Analysis. Statistical Methods of Analysis. Immediate Constituents Analysis. Distributional Analysis and Co-occurrence. Transformational Analysis. Method of Semantic Differential. Contextual Analysis.

реферат [26,5 K], добавлен 31.07.2008
Market Research. Market Research Methods
The collection and analysis of information with a view of improving the business marketing activities. Qualitative & Quantitative Research. Interviews, Desk Research, Test Trial. Search Engines. Group interviews and focus groups, Secondary research.

реферат [12,5 K], добавлен 17.02.2013
Manual for doing an educational research
Planning a research study. Explanation, as an ability to give a good theoretical background of the problem, foresee what can happen later and introduce a way of solution. Identifying a significant research problem. Conducting a pilot and the main study.

реферат [26,5 K], добавлен 01.04.2012
Amortization
Amortization as a gradual transfer process of fixed assets cost on production price with the purpose of funds accumulation for subsequent repairing. Linear method, disadvantage. Depreciation methods for the sum of years number of the useful term use.

доклад [17,9 K], добавлен 07.05.2013

Другие документы, подобные "On the binomial method for calculation of entropy"

весь список подобных работ

скачать работу можно здесь

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.