• Title: Cumulative Distribution Function

  • Series: Probability Theory

  • YouTube-Title: Probability Theory 12 | Cumulative Distribution Function

  • Bright video: https://youtu.be/N3iXUaRhyt8

  • Dark video: https://youtu.be/DxEbvbGUp_g

  • Ad-free video: Watch Vimeo video

  • Quiz: Test your knowledge

  • PDF: Download PDF version of the bright video

  • Dark-PDF: Download PDF version of the dark video

  • Print-PDF: Download printable PDF version

  • Exercise Download PDF sheets

  • Thumbnail (bright): Download PNG

  • Thumbnail (dark): Download PNG

  • Subtitle on GitHub: pt12_sub_eng.srt

  • Timestamps (n/a)
  • Subtitle in English

    1 00:00:00,409 –> 00:00:02,390 Hello and welcome back to

    2 00:00:02,400 –> 00:00:04,019 probability theory

    3 00:00:04,699 –> 00:00:05,920 and as you already know,

    4 00:00:05,929 –> 00:00:07,289 first, I want to thank all

    5 00:00:07,300 –> 00:00:08,520 the nice people that support

    6 00:00:08,529 –> 00:00:09,859 this channel on Steady or

    7 00:00:09,869 –> 00:00:10,479 PayPal.

    8 00:00:11,149 –> 00:00:12,649 Now, in today’s part 12,

    9 00:00:12,659 –> 00:00:14,350 we will talk about the important

    10 00:00:14,359 –> 00:00:16,010 cumulative distribution

    11 00:00:16,020 –> 00:00:16,489 function.

    12 00:00:17,280 –> 00:00:19,040 So here you already see, this

    13 00:00:19,049 –> 00:00:20,659 is such a long word that

    14 00:00:20,670 –> 00:00:22,239 often one just speaks of

    15 00:00:22,250 –> 00:00:23,379 the CDF

    16 00:00:23,840 –> 00:00:24,379 .

    17 00:00:25,260 –> 00:00:26,680 Now, soon we will see that

    18 00:00:26,690 –> 00:00:28,540 every random variable has

    19 00:00:28,549 –> 00:00:29,840 such a CDF.

    20 00:00:30,100 –> 00:00:32,040 For this, please recall

    21 00:00:32,049 –> 00:00:33,680 that a random variable just

    22 00:00:33,689 –> 00:00:35,250 translates from an abstract

    23 00:00:35,259 –> 00:00:37,220 probability space into a

    24 00:00:37,229 –> 00:00:39,049 very concrete one given in

    25 00:00:39,060 –> 00:00:40,200 the real number line.

    26 00:00:40,939 –> 00:00:42,290 In particular, this could

    27 00:00:42,299 –> 00:00:43,689 be a discrete probability

    28 00:00:43,700 –> 00:00:45,529 space where only some

    29 00:00:45,540 –> 00:00:47,319 values in R are relevant

    30 00:00:47,330 –> 00:00:48,430 for the probability measure

    31 00:00:48,439 –> 00:00:49,110 P_X.

    32 00:00:49,979 –> 00:00:51,419 In other words, you can easily

    33 00:00:51,430 –> 00:00:52,990 embed a discrete set of

    34 00:00:53,000 –> 00:00:54,599 numbers into R.

    35 00:00:55,369 –> 00:00:56,889 This means that if you think

    36 00:00:56,900 –> 00:00:58,750 of both cases we had, so the

    37 00:00:58,759 –> 00:01:00,590 discrete one and the absolutely

    38 00:01:00,599 –> 00:01:02,509 continuous one, both are

    39 00:01:02,520 –> 00:01:03,389 included here.

    40 00:01:04,379 –> 00:01:05,989 This helps because then the

    41 00:01:06,000 –> 00:01:07,629 notion of a CDF is

    42 00:01:07,639 –> 00:01:09,239 exactly the same in both

    43 00:01:09,250 –> 00:01:09,830 cases.

    44 00:01:10,580 –> 00:01:12,129 Therefore, I would say let’s

    45 00:01:12,139 –> 00:01:13,830 define this cumulative

    46 00:01:13,839 –> 00:01:15,230 distribution function.

    47 00:01:16,099 –> 00:01:16,519 Here,

    48 00:01:16,529 –> 00:01:18,019 I can already tell you we

    49 00:01:18,029 –> 00:01:19,279 will always denote it with

    50 00:01:19,290 –> 00:01:20,980 capital F where in the

    51 00:01:20,989 –> 00:01:22,269 index we put in the

    52 00:01:22,279 –> 00:01:23,639 random variable X.

    53 00:01:24,459 –> 00:01:26,120 And indeed, often we will

    54 00:01:26,129 –> 00:01:27,319 plot the function on the

    55 00:01:27,330 –> 00:01:28,959 whole real number line.

    56 00:01:29,830 –> 00:01:31,339 A typical graph could look

    57 00:01:31,349 –> 00:01:32,250 like this.

    58 00:01:32,260 –> 00:01:34,169 So it’s always increasing.

    59 00:01:34,860 –> 00:01:36,250 However, it’s also

    60 00:01:36,260 –> 00:01:37,690 possible that it stays

    61 00:01:37,699 –> 00:01:38,440 constant.

    62 00:01:39,300 –> 00:01:40,599 Therefore, the first warning

    63 00:01:40,610 –> 00:01:42,389 here, you should never confuse

    64 00:01:42,400 –> 00:01:44,150 this with a probability

    65 00:01:44,160 –> 00:01:45,290 density function.

    66 00:01:45,300 –> 00:01:46,279 A PDF.

    67 00:01:47,230 –> 00:01:47,730 OK.

    68 00:01:47,739 –> 00:01:49,169 I think that was enough talking

    69 00:01:49,180 –> 00:01:49,529 here.

    70 00:01:49,629 –> 00:01:51,239 Let’s go to the definition

    71 00:01:51,250 –> 00:01:53,129 now. The assumptions

    72 00:01:53,139 –> 00:01:54,410 are the same as in the last

    73 00:01:54,419 –> 00:01:54,900 video.

    74 00:01:54,910 –> 00:01:56,680 So we have Omega, A, P as a

    75 00:01:56,690 –> 00:01:58,459 probability space and

    76 00:01:58,470 –> 00:01:59,940 X as a random variable.

    77 00:02:00,690 –> 00:02:02,160 It’s not an abstract random

    78 00:02:02,169 –> 00:02:03,980 variable, but a real valued

    79 00:02:03,989 –> 00:02:05,949 one. Implicitly

    80 00:02:05,959 –> 00:02:07,190 this means we have the Borel

    81 00:02:07,379 –> 00:02:08,949 Sigma algebra for

    82 00:02:08,960 –> 00:02:09,490 R.

    83 00:02:10,429 –> 00:02:11,949 Now, in fact, this is all

    84 00:02:11,960 –> 00:02:13,399 we need and now we can

    85 00:02:13,410 –> 00:02:14,699 define the function

    86 00:02:14,710 –> 00:02:15,570 F_X.

    87 00:02:16,399 –> 00:02:17,740 It’s important to remember

    88 00:02:17,750 –> 00:02:19,149 here that the domain for

    89 00:02:19,160 –> 00:02:20,889 F of x is always the

    90 00:02:20,899 –> 00:02:22,369 whole real number line

    91 00:02:23,110 –> 00:02:24,949 and the possible values lie

    92 00:02:24,960 –> 00:02:26,490 in the unit interval.

    93 00:02:27,350 –> 00:02:28,699 So what you can keep in mind

    94 00:02:28,710 –> 00:02:30,690 is the codomain of X

    95 00:02:30,869 –> 00:02:32,850 is the domain of F_X.

    96 00:02:33,850 –> 00:02:34,190 OK.

    97 00:02:34,199 –> 00:02:35,550 Now, as you can see in the

    98 00:02:35,559 –> 00:02:37,479 unit interval F_X

    99 00:02:37,490 –> 00:02:39,429 is defined as a probability,

    100 00:02:40,419 –> 00:02:42,169 namely putting in a lower

    101 00:02:42,179 –> 00:02:44,070 case x into the function

    102 00:02:44,100 –> 00:02:45,550 is defined as

    103 00:02:46,179 –> 00:02:47,679 the probability of the

    104 00:02:47,690 –> 00:02:48,970 interval minus

    105 00:02:48,979 –> 00:02:50,919 infinity to x.

    106 00:02:51,639 –> 00:02:53,220 And it’s measured with

    107 00:02:53,229 –> 00:02:55,220 P_X, the distribution.

    108 00:02:55,229 –> 00:02:56,860 The probability distribution

    109 00:02:56,869 –> 00:02:58,500 of the random variable X

    110 00:02:59,309 –> 00:03:01,110 and with this, you see that’s

    111 00:03:01,119 –> 00:03:02,970 the reason it’s called cumulative

    112 00:03:02,979 –> 00:03:04,300 distribution function.

    113 00:03:05,050 –> 00:03:06,899 So we include here all possible

    114 00:03:06,910 –> 00:03:08,710 values until we reach the

    115 00:03:08,720 –> 00:03:09,949 given point x

    116 00:03:10,729 –> 00:03:12,309 and here, maybe it’s helpful

    117 00:03:12,320 –> 00:03:14,050 as a reminder that this can

    118 00:03:14,059 –> 00:03:15,399 be written with the original

    119 00:03:15,410 –> 00:03:16,789 probability measure P.

    120 00:03:17,619 –> 00:03:19,210 So we have P of the

    121 00:03:19,220 –> 00:03:21,000 random variable capital X

    122 00:03:21,009 –> 00:03:22,869 is less or equal than the

    123 00:03:22,880 –> 00:03:23,990 lower case x

    124 00:03:24,919 –> 00:03:26,300 and we have learned this

    125 00:03:26,309 –> 00:03:27,860 is exactly the same thing

    126 00:03:27,869 –> 00:03:28,539 as this.

    127 00:03:29,630 –> 00:03:29,929 OK.

    128 00:03:29,940 –> 00:03:31,820 Then this nice well defined

    129 00:03:31,830 –> 00:03:33,610 function F_X is called the

    130 00:03:33,619 –> 00:03:35,130 cumulative distribution

    131 00:03:35,139 –> 00:03:36,589 function of the random variable

    132 00:03:36,919 –> 00:03:38,119 capital X.

    133 00:03:38,880 –> 00:03:40,690 Indeed some people simply

    134 00:03:40,699 –> 00:03:42,399 call it distribution function

    135 00:03:42,410 –> 00:03:42,940 of X.

    136 00:03:43,710 –> 00:03:45,210 Or in short, as I already

    137 00:03:45,220 –> 00:03:46,809 told you, we just call it

    138 00:03:46,820 –> 00:03:48,410 the CDF of X.

    139 00:03:49,529 –> 00:03:51,029 Now, this definition

    140 00:03:51,039 –> 00:03:52,710 immediately applies some

    141 00:03:52,720 –> 00:03:54,610 nice characteristic properties

    142 00:03:54,619 –> 00:03:55,550 for F_X.

    143 00:03:56,289 –> 00:03:57,869 Therefore, let’s use a minute

    144 00:03:57,880 –> 00:03:59,270 to talk about these.

    145 00:04:00,179 –> 00:04:01,940 They immediately follow, because

    146 00:04:01,949 –> 00:04:03,520 we use a probability measure

    147 00:04:03,649 –> 00:04:05,240 to define F(x).

    148 00:04:05,919 –> 00:04:07,619 For example, if we make this

    149 00:04:07,630 –> 00:04:09,369 lowercase x here smaller

    150 00:04:09,380 –> 00:04:11,139 and smaller in the limit

    151 00:04:11,149 –> 00:04:12,880 x to minus infinity,

    152 00:04:12,949 –> 00:04:14,929 we would get P_X of

    153 00:04:14,940 –> 00:04:16,010 the empty set.

    154 00:04:16,640 –> 00:04:18,089 And then we have our important

    155 00:04:18,100 –> 00:04:19,329 property of a probability

    156 00:04:19,339 –> 00:04:19,820 measure.

    157 00:04:19,970 –> 00:04:21,660 The empty set is always

    158 00:04:21,670 –> 00:04:22,730 send to zero.

    159 00:04:23,549 –> 00:04:25,410 Hence F_X has

    160 00:04:25,420 –> 00:04:27,049 this property in the limit

    161 00:04:27,059 –> 00:04:28,410 to minus infinity.

    162 00:04:29,309 –> 00:04:30,779 Therefore, in the same way,

    163 00:04:30,790 –> 00:04:32,279 we can ask what happens when

    164 00:04:32,290 –> 00:04:34,230 x goes to plus infinity.

    165 00:04:35,119 –> 00:04:36,399 There, it’s also not hard

    166 00:04:36,410 –> 00:04:37,660 to see, when x

    167 00:04:37,670 –> 00:04:38,540 increases

    168 00:04:38,549 –> 00:04:40,250 we include more and more

    169 00:04:40,260 –> 00:04:41,579 in this probability measure.

    170 00:04:42,359 –> 00:04:43,660 Hence, in the limit we get

    171 00:04:43,670 –> 00:04:44,890 here the whole real number

    172 00:04:44,899 –> 00:04:45,410 line.

    173 00:04:45,429 –> 00:04:47,410 So we have P_X of R

    174 00:04:48,339 –> 00:04:50,109 and this is the next important

    175 00:04:50,119 –> 00:04:51,459 property of a probability

    176 00:04:51,470 –> 00:04:52,000 measure.

    177 00:04:52,149 –> 00:04:53,570 The whole sample space is

    178 00:04:53,579 –> 00:04:55,109 always send to one.

    179 00:04:56,010 –> 00:04:56,510 OK.

    180 00:04:56,519 –> 00:04:57,950 So it’s not hard to see that

    181 00:04:57,959 –> 00:04:59,369 we have these two limits

    182 00:04:59,380 –> 00:05:00,350 for F_X.

    183 00:05:01,369 –> 00:05:03,179 Moreover, we also see

    184 00:05:03,190 –> 00:05:04,720 immediately that the function

    185 00:05:04,730 –> 00:05:06,380 F_X is monotonically

    186 00:05:06,390 –> 00:05:07,140 increasing.

    187 00:05:08,049 –> 00:05:08,359 Here

    188 00:05:08,369 –> 00:05:09,470 please recall

    189 00:05:09,480 –> 00:05:11,160 this means if we go from

    190 00:05:11,170 –> 00:05:13,160 one point x_1 to a larger

    191 00:05:13,170 –> 00:05:15,130 point x_2, the

    192 00:05:15,140 –> 00:05:16,350 value of the function

    193 00:05:16,359 –> 00:05:18,220 increases or it stays the

    194 00:05:18,230 –> 00:05:18,649 same.

    195 00:05:19,529 –> 00:05:20,950 This follows because the

    196 00:05:20,959 –> 00:05:22,589 probability measure is

    197 00:05:22,600 –> 00:05:23,869 always monotonic.

    198 00:05:24,649 –> 00:05:25,989 This means if we measure

    199 00:05:26,000 –> 00:05:27,450 a set with the probability

    200 00:05:27,459 –> 00:05:29,390 measure, then all subsets

    201 00:05:29,399 –> 00:05:31,119 of this set have a smaller

    202 00:05:31,130 –> 00:05:32,470 measure or the same.

    203 00:05:33,380 –> 00:05:34,970 So you see also this

    204 00:05:34,980 –> 00:05:36,640 property is not hard to

    205 00:05:36,649 –> 00:05:37,399 understand.

    206 00:05:38,200 –> 00:05:40,059 However, now the third and

    207 00:05:40,070 –> 00:05:41,799 the last property here is

    208 00:05:41,809 –> 00:05:43,329 a little bit more technical.

    209 00:05:44,059 –> 00:05:45,739 I say this, because it tells

    210 00:05:45,750 –> 00:05:47,690 us that F_X is right-

    211 00:05:47,700 –> 00:05:48,589 continuous.

    212 00:05:49,459 –> 00:05:51,079 This means that in general

    213 00:05:51,089 –> 00:05:53,040 it’s not a continuous function

    214 00:05:53,230 –> 00:05:54,649 but it is continuous,

    215 00:05:54,660 –> 00:05:56,190 if you just look from the

    216 00:05:56,200 –> 00:05:58,010 right-hand side. This

    217 00:05:58,019 –> 00:05:59,369 means we have to look at

    218 00:05:59,380 –> 00:06:01,019 the right limit here, where

    219 00:06:01,029 –> 00:06:02,679 x is always larger than the

    220 00:06:02,690 –> 00:06:03,739 point x_0

    221 00:06:04,390 –> 00:06:05,890 and then right-continuity

    222 00:06:05,899 –> 00:06:07,540 means that this limit is

    223 00:06:07,549 –> 00:06:09,209 always F_X at the

    224 00:06:09,220 –> 00:06:10,369 point x_0

    225 00:06:11,220 –> 00:06:12,670 and this is the property

    226 00:06:12,679 –> 00:06:14,010 we have for the CDF

    227 00:06:14,119 –> 00:06:15,040 F_X.

    228 00:06:15,850 –> 00:06:17,570 Now the meaning of this in

    229 00:06:17,579 –> 00:06:19,000 the graph, you can see

    230 00:06:19,010 –> 00:06:20,839 above. At each

    231 00:06:20,850 –> 00:06:22,510 jump point as this one

    232 00:06:22,630 –> 00:06:24,279 the filled in circle has to

    233 00:06:24,290 –> 00:06:25,579 lie at the upper part on

    234 00:06:25,589 –> 00:06:26,149 the right.

    235 00:06:27,239 –> 00:06:27,690 OK.

    236 00:06:27,700 –> 00:06:28,700 I don’t want to prove the

    237 00:06:28,709 –> 00:06:30,420 fact here, but you should

    238 00:06:30,429 –> 00:06:31,640 see the reason for it.

    239 00:06:31,859 –> 00:06:33,649 We include x here in

    240 00:06:33,660 –> 00:06:34,570 this interval

    241 00:06:35,230 –> 00:06:36,600 and the jump in the graph

    242 00:06:36,609 –> 00:06:38,390 would mean that the singleton,

    243 00:06:38,399 –> 00:06:40,170 just with the point x, has

    244 00:06:40,179 –> 00:06:41,609 a non-zero probability

    245 00:06:42,299 –> 00:06:43,519 and this non-zero

    246 00:06:43,540 –> 00:06:45,070 probability is then

    247 00:06:45,079 –> 00:06:46,450 included in the whole

    248 00:06:46,459 –> 00:06:47,540 probability here.

    249 00:06:48,440 –> 00:06:48,970 OK.

    250 00:06:48,980 –> 00:06:50,640 Then I think we can look

    251 00:06:50,649 –> 00:06:51,829 at an example.

    252 00:06:52,670 –> 00:06:54,040 In fact, this will be one

    253 00:06:54,049 –> 00:06:55,760 of the most important examples

    254 00:06:55,769 –> 00:06:57,750 we have. We take the

    255 00:06:57,760 –> 00:06:59,579 random variable X which should

    256 00:06:59,589 –> 00:07:01,359 have the distribution given

    257 00:07:01,369 –> 00:07:03,119 by the normal distribution.

    258 00:07:03,649 –> 00:07:04,920 If you have already heard

    259 00:07:04,929 –> 00:07:06,320 of the normal distribution,

    260 00:07:06,380 –> 00:07:07,950 then you know it has two

    261 00:07:07,959 –> 00:07:08,880 parameters

    262 00:07:09,579 –> 00:07:10,799 and the simplest case would

    263 00:07:10,809 –> 00:07:12,230 be that we set the mean to

    264 00:07:12,239 –> 00:07:14,040 be 0 and the standard

    265 00:07:14,049 –> 00:07:15,720 deviation to be 1.

    266 00:07:16,459 –> 00:07:17,959 If you have never seen this,

    267 00:07:17,970 –> 00:07:19,640 this is not a problem, because

    268 00:07:19,649 –> 00:07:20,940 now we discuss it.

    269 00:07:21,779 –> 00:07:23,359 It’s an absolutely continuous

    270 00:07:23,369 –> 00:07:24,970 case, which means the whole

    271 00:07:24,980 –> 00:07:26,799 probability measure is given

    272 00:07:26,809 –> 00:07:28,269 by a probability density

    273 00:07:28,279 –> 00:07:28,799 function

    274 00:07:29,640 –> 00:07:31,339 and indeed, this one is

    275 00:07:31,350 –> 00:07:32,859 given by the famous bell

    276 00:07:32,869 –> 00:07:33,500 curve.

    277 00:07:34,329 –> 00:07:35,799 It’s not just any bell

    278 00:07:35,809 –> 00:07:37,429 curve, it’s given by the

    279 00:07:37,440 –> 00:07:38,630 Gaussian function.

    280 00:07:39,390 –> 00:07:40,980 Which means it’s given by

    281 00:07:40,989 –> 00:07:42,739 e to the power minus

    282 00:07:42,750 –> 00:07:44,299 1/2 * x

    283 00:07:44,309 –> 00:07:44,850 squared

    284 00:07:45,980 –> 00:07:47,339 and the correct normalization

    285 00:07:47,350 –> 00:07:48,829 in front would be given by

    286 00:07:48,839 –> 00:07:50,679 one divided by the square

    287 00:07:50,690 –> 00:07:52,480 root of two times pi.

    288 00:07:53,260 –> 00:07:54,839 Now, in general, in the case,

    289 00:07:54,850 –> 00:07:56,220 when we have the probability

    290 00:07:56,230 –> 00:07:57,519 density function, the

    291 00:07:57,529 –> 00:07:59,480 PDF, we can easily

    292 00:07:59,489 –> 00:08:01,140 calculate the CDF.

    293 00:08:01,929 –> 00:08:03,209 In fact, the only thing we

    294 00:08:03,220 –> 00:08:05,170 have to do is to integrate

    295 00:08:05,179 –> 00:08:06,529 over the PDF,

    296 00:08:07,089 –> 00:08:08,690 which means we go from minus

    297 00:08:08,700 –> 00:08:10,209 infinity to the point

    298 00:08:10,220 –> 00:08:10,809 x

    299 00:08:11,549 –> 00:08:12,779 and then we only have to

    300 00:08:12,790 –> 00:08:14,529 put in the PDF where the

    301 00:08:14,540 –> 00:08:16,350 constant is written in front

    302 00:08:16,359 –> 00:08:17,190 of the integral.

    303 00:08:17,730 –> 00:08:19,500 However, here, because x

    304 00:08:19,510 –> 00:08:21,410 is already chosen as available,

    305 00:08:21,420 –> 00:08:22,570 we need another one.

    306 00:08:22,579 –> 00:08:23,850 So let’s choose t.

    307 00:08:24,640 –> 00:08:26,500 Now, for the normal distribution,

    308 00:08:26,510 –> 00:08:28,279 we can not simplify this

    309 00:08:28,290 –> 00:08:29,720 integral immediately,

    310 00:08:29,799 –> 00:08:31,100 but of course, we can draw

    311 00:08:31,160 –> 00:08:31,839 the graph.

    312 00:08:32,640 –> 00:08:34,099 So we would start with minus

    313 00:08:34,109 –> 00:08:35,859 infinity, then we would increase

    314 00:08:35,869 –> 00:08:36,700 and increase

    315 00:08:36,849 –> 00:08:38,419 and in the end in the limit,

    316 00:08:38,429 –> 00:08:39,200 we would go to

    317 00:08:39,210 –> 00:08:41,179 +1. One important

    318 00:08:41,190 –> 00:08:42,500 property you see here is,

    319 00:08:42,510 –> 00:08:44,380 because the PDF is

    320 00:08:44,390 –> 00:08:46,179 symmetric, we will hit

    321 00:08:46,190 –> 00:08:47,219 one half here.

    322 00:08:48,219 –> 00:08:48,679 OK.

    323 00:08:48,690 –> 00:08:50,280 Because my drawing here is

    324 00:08:50,289 –> 00:08:51,770 not perfect at all.

    325 00:08:51,780 –> 00:08:53,320 I would suggest that we now

    326 00:08:53,330 –> 00:08:54,150 go into

    327 00:08:54,160 –> 00:08:55,359 Rstudio.

    328 00:08:56,650 –> 00:08:57,140 Here

    329 00:08:57,150 –> 00:08:58,130 the first thing we should

    330 00:08:58,140 –> 00:09:00,010 check is the help function.

    331 00:09:00,020 –> 00:09:01,219 So what is the normal

    332 00:09:01,229 –> 00:09:02,780 distribution in R?

    333 00:09:03,750 –> 00:09:05,450 Indeed here you see we have

    334 00:09:05,460 –> 00:09:06,229 everything.

    335 00:09:06,299 –> 00:09:07,679 So the whole explanation

    336 00:09:07,690 –> 00:09:09,309 what the normal distribution

    337 00:09:09,320 –> 00:09:09,809 is.

    338 00:09:10,469 –> 00:09:12,239 Most importantly, we see

    339 00:09:12,250 –> 00:09:13,630 the probability density

    340 00:09:13,640 –> 00:09:15,330 function is here given

    341 00:09:15,340 –> 00:09:16,570 with dnorm.

    342 00:09:17,239 –> 00:09:18,940 On the other hand the CDF,

    343 00:09:18,950 –> 00:09:20,400 the distribution

    344 00:09:20,409 –> 00:09:22,020 function is given by

    345 00:09:22,030 –> 00:09:22,890 pnorm

    346 00:09:23,809 –> 00:09:25,649 and the last one here,

    347 00:09:25,655 –> 00:09:27,179 rnorm will just give us a

    348 00:09:27,190 –> 00:09:28,869 sample with respect to the

    349 00:09:28,880 –> 00:09:29,989 normal distribution.

    350 00:09:30,919 –> 00:09:32,460 Of course, in addition, you

    351 00:09:32,469 –> 00:09:34,380 also get a lot of explanations

    352 00:09:34,390 –> 00:09:34,950 here.

    353 00:09:34,989 –> 00:09:36,609 Most importantly, you

    354 00:09:36,619 –> 00:09:38,419 see the definition

    355 00:09:38,429 –> 00:09:39,789 of the probability density

    356 00:09:39,799 –> 00:09:41,549 function is also given

    357 00:09:41,559 –> 00:09:41,900 here.

    358 00:09:42,570 –> 00:09:44,299 However, you see the parameter

    359 00:09:44,309 –> 00:09:45,979 sigma and the parameter

    360 00:09:45,989 –> 00:09:47,590 mu are included here.

    361 00:09:48,320 –> 00:09:49,830 Nevertheless, the default

    362 00:09:49,840 –> 00:09:51,460 values are the same as we

    363 00:09:51,469 –> 00:09:52,159 have chosen.

    364 00:09:52,940 –> 00:09:53,440 OK.

    365 00:09:53,450 –> 00:09:54,880 Then I would say we can play

    366 00:09:54,890 –> 00:09:56,640 around and plot the CDF

    367 00:09:56,650 –> 00:09:58,190 and the PDF.

    368 00:09:58,849 –> 00:10:00,229 Now for plotting, we need

    369 00:10:00,239 –> 00:10:01,309 the whole sequence.

    370 00:10:01,320 –> 00:10:03,150 So let’s set x to the

    371 00:10:03,159 –> 00:10:04,849 sequence where we start

    372 00:10:04,859 –> 00:10:06,690 maybe with minus 10

    373 00:10:06,799 –> 00:10:08,750 and go to plus 10 with

    374 00:10:08,760 –> 00:10:09,390 step size

    375 00:10:09,400 –> 00:10:11,390 0.01.

    376 00:10:12,390 –> 00:10:13,489 Let’s hit enter

    377 00:10:13,500 –> 00:10:14,909 and then we see the result

    378 00:10:14,919 –> 00:10:15,390 here.

    379 00:10:16,340 –> 00:10:18,010 Indeed, we recognize that

    380 00:10:18,020 –> 00:10:19,710 we have 2001

    381 00:10:19,719 –> 00:10:20,409 points.

    382 00:10:21,169 –> 00:10:22,549 Then in the next step, we

    383 00:10:22,559 –> 00:10:24,070 can put all these points

    384 00:10:24,080 –> 00:10:25,669 into the density function

    385 00:10:25,719 –> 00:10:27,439 which was given by

    386 00:10:27,449 –> 00:10:28,190 dnorm.

    387 00:10:31,039 –> 00:10:31,489 OK.

    388 00:10:31,500 –> 00:10:32,739 So here you see the best

    389 00:10:32,750 –> 00:10:34,250 thing would be to enlarge

    390 00:10:34,260 –> 00:10:35,140 the whole picture.

    391 00:10:36,309 –> 00:10:38,140 Now, in this picture, we

    392 00:10:38,150 –> 00:10:40,080 recognize our bell curve.

    393 00:10:40,849 –> 00:10:42,250 However, maybe it’s better

    394 00:10:42,260 –> 00:10:43,450 to not just plot the

    395 00:10:43,460 –> 00:10:45,200 values but also the

    396 00:10:45,210 –> 00:10:46,359 points x.

    397 00:10:46,369 –> 00:10:47,809 So we put in x

    398 00:10:47,820 –> 00:10:49,750 comma dnorm of x

    399 00:10:51,090 –> 00:10:52,409 and there you see, we have

    400 00:10:52,419 –> 00:10:54,309 our peak exactly at the point

    401 00:10:54,320 –> 00:10:55,940 x is equal to zero.

    402 00:10:56,729 –> 00:10:57,309 OK.

    403 00:10:57,330 –> 00:10:58,960 Then I would say in the same

    404 00:10:58,969 –> 00:11:00,030 way we can plot the

    405 00:11:00,040 –> 00:11:01,289 CDF.

    406 00:11:01,340 –> 00:11:02,630 So we take x

    407 00:11:02,640 –> 00:11:04,510 comma pnorm

    408 00:11:04,520 –> 00:11:05,429 of x.

    409 00:11:07,880 –> 00:11:09,460 So here you see this is like

    410 00:11:09,469 –> 00:11:10,809 our picture from before,

    411 00:11:10,820 –> 00:11:12,739 but now from minus 10 to

    412 00:11:12,750 –> 00:11:13,500 plus 10.

    413 00:11:14,340 –> 00:11:15,679 Hence, you can see in the

    414 00:11:15,690 –> 00:11:17,679 limits we go from zero

    415 00:11:17,809 –> 00:11:18,900 to plus one.

    416 00:11:19,770 –> 00:11:21,270 Now I would suggest that

    417 00:11:21,280 –> 00:11:22,460 you play around with all

    418 00:11:22,469 –> 00:11:24,010 these functions to get a

    419 00:11:24,020 –> 00:11:25,429 good visualization for the

    420 00:11:25,440 –> 00:11:26,750 PDF and for the

    421 00:11:26,760 –> 00:11:27,549 CDF.

    422 00:11:28,479 –> 00:11:28,840 OK.

    423 00:11:28,849 –> 00:11:30,380 Now, at the end of the video,

    424 00:11:30,390 –> 00:11:32,030 we can also look at the function

    425 00:11:32,039 –> 00:11:33,099 rnorm

    426 00:11:34,440 –> 00:11:36,059 and if we put in a number,

    427 00:11:36,070 –> 00:11:37,940 we get as many samples as

    428 00:11:37,950 –> 00:11:38,619 we want.

    429 00:11:39,979 –> 00:11:41,159 So in this case, we took

    430 00:11:41,169 –> 00:11:42,530 10 but of course, we could

    431 00:11:42,539 –> 00:11:44,479 also take 6000 for

    432 00:11:44,489 –> 00:11:45,080 example.

    433 00:11:46,250 –> 00:11:47,950 So showing all of them is

    434 00:11:47,960 –> 00:11:49,469 not so exciting, but

    435 00:11:49,479 –> 00:11:51,059 maybe a histogram is.

    436 00:11:51,799 –> 00:11:53,659 So let’s put in histogram

    437 00:11:53,669 –> 00:11:55,359 of rnorm

    438 00:11:55,510 –> 00:11:56,820 6000

    439 00:11:59,219 –> 00:12:00,739 and here again, we

    440 00:12:00,750 –> 00:12:02,090 recognize some bell

    441 00:12:02,099 –> 00:12:02,760 curve.

    442 00:12:03,450 –> 00:12:05,109 So maybe we can just repeat

    443 00:12:05,119 –> 00:12:06,200 the whole thing again

    444 00:12:07,380 –> 00:12:09,159 and as you can see, we don’t

    445 00:12:09,169 –> 00:12:10,109 change so much.

    446 00:12:11,109 –> 00:12:12,729 However, maybe we can

    447 00:12:12,739 –> 00:12:14,500 also increase the number.

    448 00:12:16,429 –> 00:12:17,789 In another video, I already

    449 00:12:17,799 –> 00:12:19,289 showed you how you can make

    450 00:12:19,299 –> 00:12:20,489 a histogram a little bit

    451 00:12:20,500 –> 00:12:22,349 nicer and maybe

    452 00:12:22,359 –> 00:12:23,390 this is something you should

    453 00:12:23,400 –> 00:12:24,500 do here as well.

    454 00:12:25,349 –> 00:12:25,849 OK.

    455 00:12:25,859 –> 00:12:27,169 Now, I think that’s good

    456 00:12:27,179 –> 00:12:28,530 enough for this video.

    457 00:12:28,700 –> 00:12:30,289 I really hope you now know

    458 00:12:30,299 –> 00:12:31,830 what a normal distribution

    459 00:12:31,840 –> 00:12:33,450 is and also what a

    460 00:12:33,460 –> 00:12:35,010 general CDF is.

    461 00:12:35,789 –> 00:12:37,169 Of course, all questions

    462 00:12:37,179 –> 00:12:38,640 you can put into the comments

    463 00:12:38,780 –> 00:12:40,429 and then I hope I see you

    464 00:12:40,440 –> 00:12:41,619 in the next video.

    465 00:12:41,780 –> 00:12:42,559 Bye.

  • Quiz Content

    Q1: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. What is the definition of $F_X$?

    A1: $F_X(x) = \mathbb{P}(X \leq x)$

    A2: $F_X(x) = \mathbb{P}(X(x))$

    A3: $F_X(x) = \mathbb{P}(X \geq x)$

    A4: $F_X(x) = \mathbb{P}(X \in x)$

    A5: $F_X(x) = \mathbb{P}(X \notin x)$

    Q2: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. How do we call $F_X$?

    A1: Probability density function of $X$.

    A2: Probability distribution of $X$.

    A3: Cumulative distribution function of $X$.

    A4: Measure space of $X$.

    Q3: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. What is not a property of the CDF $F_X$ in general?

    A1: It is a real-valued function.

    A2: It is function with codomain $[0,1]$.

    A3: It is a right-continuous function.

    A4: It is a bounded function.

    A5: It is a monotonically increasing function.

    A6: It is a continuous function.

    Q4: What is a possible outcome of the following R code? $$\texttt{rnorm(3)}$$

    A1: $\texttt{3, 4}$

    A2: $\texttt{1, 2, 3}$

    A3: $\texttt{-0.09301465, 0.27320039, -1.6674600111}$

    A4: $\texttt{3.0, 3.0}$

    A5: $\texttt{3, 3, 3, 3}$

  • Last update: 2024-10

  • Back to overview page


Do you search for another mathematical topic?