-
Title: Cumulative Distribution Function
-
Series: Probability Theory
-
YouTube-Title: Probability Theory 12 | Cumulative Distribution Function
-
Bright video: https://youtu.be/N3iXUaRhyt8
-
Dark video: https://youtu.be/DxEbvbGUp_g
-
Ad-free video: Watch Vimeo video
-
Quiz: Test your knowledge
-
Dark-PDF: Download PDF version of the dark video
-
Print-PDF: Download printable PDF version
-
Exercise Download PDF sheets
-
Thumbnail (bright): Download PNG
-
Thumbnail (dark): Download PNG
-
Subtitle on GitHub: pt12_sub_eng.srt
-
Timestamps (n/a)
-
Subtitle in English
1 00:00:00,409 –> 00:00:02,390 Hello and welcome back to
2 00:00:02,400 –> 00:00:04,019 probability theory
3 00:00:04,699 –> 00:00:05,920 and as you already know,
4 00:00:05,929 –> 00:00:07,289 first, I want to thank all
5 00:00:07,300 –> 00:00:08,520 the nice people that support
6 00:00:08,529 –> 00:00:09,859 this channel on Steady or
7 00:00:09,869 –> 00:00:10,479 PayPal.
8 00:00:11,149 –> 00:00:12,649 Now, in today’s part 12,
9 00:00:12,659 –> 00:00:14,350 we will talk about the important
10 00:00:14,359 –> 00:00:16,010 cumulative distribution
11 00:00:16,020 –> 00:00:16,489 function.
12 00:00:17,280 –> 00:00:19,040 So here you already see, this
13 00:00:19,049 –> 00:00:20,659 is such a long word that
14 00:00:20,670 –> 00:00:22,239 often one just speaks of
15 00:00:22,250 –> 00:00:23,379 the CDF
16 00:00:23,840 –> 00:00:24,379 .
17 00:00:25,260 –> 00:00:26,680 Now, soon we will see that
18 00:00:26,690 –> 00:00:28,540 every random variable has
19 00:00:28,549 –> 00:00:29,840 such a CDF.
20 00:00:30,100 –> 00:00:32,040 For this, please recall
21 00:00:32,049 –> 00:00:33,680 that a random variable just
22 00:00:33,689 –> 00:00:35,250 translates from an abstract
23 00:00:35,259 –> 00:00:37,220 probability space into a
24 00:00:37,229 –> 00:00:39,049 very concrete one given in
25 00:00:39,060 –> 00:00:40,200 the real number line.
26 00:00:40,939 –> 00:00:42,290 In particular, this could
27 00:00:42,299 –> 00:00:43,689 be a discrete probability
28 00:00:43,700 –> 00:00:45,529 space where only some
29 00:00:45,540 –> 00:00:47,319 values in R are relevant
30 00:00:47,330 –> 00:00:48,430 for the probability measure
31 00:00:48,439 –> 00:00:49,110 P_X.
32 00:00:49,979 –> 00:00:51,419 In other words, you can easily
33 00:00:51,430 –> 00:00:52,990 embed a discrete set of
34 00:00:53,000 –> 00:00:54,599 numbers into R.
35 00:00:55,369 –> 00:00:56,889 This means that if you think
36 00:00:56,900 –> 00:00:58,750 of both cases we had, so the
37 00:00:58,759 –> 00:01:00,590 discrete one and the absolutely
38 00:01:00,599 –> 00:01:02,509 continuous one, both are
39 00:01:02,520 –> 00:01:03,389 included here.
40 00:01:04,379 –> 00:01:05,989 This helps because then the
41 00:01:06,000 –> 00:01:07,629 notion of a CDF is
42 00:01:07,639 –> 00:01:09,239 exactly the same in both
43 00:01:09,250 –> 00:01:09,830 cases.
44 00:01:10,580 –> 00:01:12,129 Therefore, I would say let’s
45 00:01:12,139 –> 00:01:13,830 define this cumulative
46 00:01:13,839 –> 00:01:15,230 distribution function.
47 00:01:16,099 –> 00:01:16,519 Here,
48 00:01:16,529 –> 00:01:18,019 I can already tell you we
49 00:01:18,029 –> 00:01:19,279 will always denote it with
50 00:01:19,290 –> 00:01:20,980 capital F where in the
51 00:01:20,989 –> 00:01:22,269 index we put in the
52 00:01:22,279 –> 00:01:23,639 random variable X.
53 00:01:24,459 –> 00:01:26,120 And indeed, often we will
54 00:01:26,129 –> 00:01:27,319 plot the function on the
55 00:01:27,330 –> 00:01:28,959 whole real number line.
56 00:01:29,830 –> 00:01:31,339 A typical graph could look
57 00:01:31,349 –> 00:01:32,250 like this.
58 00:01:32,260 –> 00:01:34,169 So it’s always increasing.
59 00:01:34,860 –> 00:01:36,250 However, it’s also
60 00:01:36,260 –> 00:01:37,690 possible that it stays
61 00:01:37,699 –> 00:01:38,440 constant.
62 00:01:39,300 –> 00:01:40,599 Therefore, the first warning
63 00:01:40,610 –> 00:01:42,389 here, you should never confuse
64 00:01:42,400 –> 00:01:44,150 this with a probability
65 00:01:44,160 –> 00:01:45,290 density function.
66 00:01:45,300 –> 00:01:46,279 A PDF.
67 00:01:47,230 –> 00:01:47,730 OK.
68 00:01:47,739 –> 00:01:49,169 I think that was enough talking
69 00:01:49,180 –> 00:01:49,529 here.
70 00:01:49,629 –> 00:01:51,239 Let’s go to the definition
71 00:01:51,250 –> 00:01:53,129 now. The assumptions
72 00:01:53,139 –> 00:01:54,410 are the same as in the last
73 00:01:54,419 –> 00:01:54,900 video.
74 00:01:54,910 –> 00:01:56,680 So we have Omega, A, P as a
75 00:01:56,690 –> 00:01:58,459 probability space and
76 00:01:58,470 –> 00:01:59,940 X as a random variable.
77 00:02:00,690 –> 00:02:02,160 It’s not an abstract random
78 00:02:02,169 –> 00:02:03,980 variable, but a real valued
79 00:02:03,989 –> 00:02:05,949 one. Implicitly
80 00:02:05,959 –> 00:02:07,190 this means we have the Borel
81 00:02:07,379 –> 00:02:08,949 Sigma algebra for
82 00:02:08,960 –> 00:02:09,490 R.
83 00:02:10,429 –> 00:02:11,949 Now, in fact, this is all
84 00:02:11,960 –> 00:02:13,399 we need and now we can
85 00:02:13,410 –> 00:02:14,699 define the function
86 00:02:14,710 –> 00:02:15,570 F_X.
87 00:02:16,399 –> 00:02:17,740 It’s important to remember
88 00:02:17,750 –> 00:02:19,149 here that the domain for
89 00:02:19,160 –> 00:02:20,889 F of x is always the
90 00:02:20,899 –> 00:02:22,369 whole real number line
91 00:02:23,110 –> 00:02:24,949 and the possible values lie
92 00:02:24,960 –> 00:02:26,490 in the unit interval.
93 00:02:27,350 –> 00:02:28,699 So what you can keep in mind
94 00:02:28,710 –> 00:02:30,690 is the codomain of X
95 00:02:30,869 –> 00:02:32,850 is the domain of F_X.
96 00:02:33,850 –> 00:02:34,190 OK.
97 00:02:34,199 –> 00:02:35,550 Now, as you can see in the
98 00:02:35,559 –> 00:02:37,479 unit interval F_X
99 00:02:37,490 –> 00:02:39,429 is defined as a probability,
100 00:02:40,419 –> 00:02:42,169 namely putting in a lower
101 00:02:42,179 –> 00:02:44,070 case x into the function
102 00:02:44,100 –> 00:02:45,550 is defined as
103 00:02:46,179 –> 00:02:47,679 the probability of the
104 00:02:47,690 –> 00:02:48,970 interval minus
105 00:02:48,979 –> 00:02:50,919 infinity to x.
106 00:02:51,639 –> 00:02:53,220 And it’s measured with
107 00:02:53,229 –> 00:02:55,220 P_X, the distribution.
108 00:02:55,229 –> 00:02:56,860 The probability distribution
109 00:02:56,869 –> 00:02:58,500 of the random variable X
110 00:02:59,309 –> 00:03:01,110 and with this, you see that’s
111 00:03:01,119 –> 00:03:02,970 the reason it’s called cumulative
112 00:03:02,979 –> 00:03:04,300 distribution function.
113 00:03:05,050 –> 00:03:06,899 So we include here all possible
114 00:03:06,910 –> 00:03:08,710 values until we reach the
115 00:03:08,720 –> 00:03:09,949 given point x
116 00:03:10,729 –> 00:03:12,309 and here, maybe it’s helpful
117 00:03:12,320 –> 00:03:14,050 as a reminder that this can
118 00:03:14,059 –> 00:03:15,399 be written with the original
119 00:03:15,410 –> 00:03:16,789 probability measure P.
120 00:03:17,619 –> 00:03:19,210 So we have P of the
121 00:03:19,220 –> 00:03:21,000 random variable capital X
122 00:03:21,009 –> 00:03:22,869 is less or equal than the
123 00:03:22,880 –> 00:03:23,990 lower case x
124 00:03:24,919 –> 00:03:26,300 and we have learned this
125 00:03:26,309 –> 00:03:27,860 is exactly the same thing
126 00:03:27,869 –> 00:03:28,539 as this.
127 00:03:29,630 –> 00:03:29,929 OK.
128 00:03:29,940 –> 00:03:31,820 Then this nice well defined
129 00:03:31,830 –> 00:03:33,610 function F_X is called the
130 00:03:33,619 –> 00:03:35,130 cumulative distribution
131 00:03:35,139 –> 00:03:36,589 function of the random variable
132 00:03:36,919 –> 00:03:38,119 capital X.
133 00:03:38,880 –> 00:03:40,690 Indeed some people simply
134 00:03:40,699 –> 00:03:42,399 call it distribution function
135 00:03:42,410 –> 00:03:42,940 of X.
136 00:03:43,710 –> 00:03:45,210 Or in short, as I already
137 00:03:45,220 –> 00:03:46,809 told you, we just call it
138 00:03:46,820 –> 00:03:48,410 the CDF of X.
139 00:03:49,529 –> 00:03:51,029 Now, this definition
140 00:03:51,039 –> 00:03:52,710 immediately applies some
141 00:03:52,720 –> 00:03:54,610 nice characteristic properties
142 00:03:54,619 –> 00:03:55,550 for F_X.
143 00:03:56,289 –> 00:03:57,869 Therefore, let’s use a minute
144 00:03:57,880 –> 00:03:59,270 to talk about these.
145 00:04:00,179 –> 00:04:01,940 They immediately follow, because
146 00:04:01,949 –> 00:04:03,520 we use a probability measure
147 00:04:03,649 –> 00:04:05,240 to define F(x).
148 00:04:05,919 –> 00:04:07,619 For example, if we make this
149 00:04:07,630 –> 00:04:09,369 lowercase x here smaller
150 00:04:09,380 –> 00:04:11,139 and smaller in the limit
151 00:04:11,149 –> 00:04:12,880 x to minus infinity,
152 00:04:12,949 –> 00:04:14,929 we would get P_X of
153 00:04:14,940 –> 00:04:16,010 the empty set.
154 00:04:16,640 –> 00:04:18,089 And then we have our important
155 00:04:18,100 –> 00:04:19,329 property of a probability
156 00:04:19,339 –> 00:04:19,820 measure.
157 00:04:19,970 –> 00:04:21,660 The empty set is always
158 00:04:21,670 –> 00:04:22,730 send to zero.
159 00:04:23,549 –> 00:04:25,410 Hence F_X has
160 00:04:25,420 –> 00:04:27,049 this property in the limit
161 00:04:27,059 –> 00:04:28,410 to minus infinity.
162 00:04:29,309 –> 00:04:30,779 Therefore, in the same way,
163 00:04:30,790 –> 00:04:32,279 we can ask what happens when
164 00:04:32,290 –> 00:04:34,230 x goes to plus infinity.
165 00:04:35,119 –> 00:04:36,399 There, it’s also not hard
166 00:04:36,410 –> 00:04:37,660 to see, when x
167 00:04:37,670 –> 00:04:38,540 increases
168 00:04:38,549 –> 00:04:40,250 we include more and more
169 00:04:40,260 –> 00:04:41,579 in this probability measure.
170 00:04:42,359 –> 00:04:43,660 Hence, in the limit we get
171 00:04:43,670 –> 00:04:44,890 here the whole real number
172 00:04:44,899 –> 00:04:45,410 line.
173 00:04:45,429 –> 00:04:47,410 So we have P_X of R
174 00:04:48,339 –> 00:04:50,109 and this is the next important
175 00:04:50,119 –> 00:04:51,459 property of a probability
176 00:04:51,470 –> 00:04:52,000 measure.
177 00:04:52,149 –> 00:04:53,570 The whole sample space is
178 00:04:53,579 –> 00:04:55,109 always send to one.
179 00:04:56,010 –> 00:04:56,510 OK.
180 00:04:56,519 –> 00:04:57,950 So it’s not hard to see that
181 00:04:57,959 –> 00:04:59,369 we have these two limits
182 00:04:59,380 –> 00:05:00,350 for F_X.
183 00:05:01,369 –> 00:05:03,179 Moreover, we also see
184 00:05:03,190 –> 00:05:04,720 immediately that the function
185 00:05:04,730 –> 00:05:06,380 F_X is monotonically
186 00:05:06,390 –> 00:05:07,140 increasing.
187 00:05:08,049 –> 00:05:08,359 Here
188 00:05:08,369 –> 00:05:09,470 please recall
189 00:05:09,480 –> 00:05:11,160 this means if we go from
190 00:05:11,170 –> 00:05:13,160 one point x_1 to a larger
191 00:05:13,170 –> 00:05:15,130 point x_2, the
192 00:05:15,140 –> 00:05:16,350 value of the function
193 00:05:16,359 –> 00:05:18,220 increases or it stays the
194 00:05:18,230 –> 00:05:18,649 same.
195 00:05:19,529 –> 00:05:20,950 This follows because the
196 00:05:20,959 –> 00:05:22,589 probability measure is
197 00:05:22,600 –> 00:05:23,869 always monotonic.
198 00:05:24,649 –> 00:05:25,989 This means if we measure
199 00:05:26,000 –> 00:05:27,450 a set with the probability
200 00:05:27,459 –> 00:05:29,390 measure, then all subsets
201 00:05:29,399 –> 00:05:31,119 of this set have a smaller
202 00:05:31,130 –> 00:05:32,470 measure or the same.
203 00:05:33,380 –> 00:05:34,970 So you see also this
204 00:05:34,980 –> 00:05:36,640 property is not hard to
205 00:05:36,649 –> 00:05:37,399 understand.
206 00:05:38,200 –> 00:05:40,059 However, now the third and
207 00:05:40,070 –> 00:05:41,799 the last property here is
208 00:05:41,809 –> 00:05:43,329 a little bit more technical.
209 00:05:44,059 –> 00:05:45,739 I say this, because it tells
210 00:05:45,750 –> 00:05:47,690 us that F_X is right-
211 00:05:47,700 –> 00:05:48,589 continuous.
212 00:05:49,459 –> 00:05:51,079 This means that in general
213 00:05:51,089 –> 00:05:53,040 it’s not a continuous function
214 00:05:53,230 –> 00:05:54,649 but it is continuous,
215 00:05:54,660 –> 00:05:56,190 if you just look from the
216 00:05:56,200 –> 00:05:58,010 right-hand side. This
217 00:05:58,019 –> 00:05:59,369 means we have to look at
218 00:05:59,380 –> 00:06:01,019 the right limit here, where
219 00:06:01,029 –> 00:06:02,679 x is always larger than the
220 00:06:02,690 –> 00:06:03,739 point x_0
221 00:06:04,390 –> 00:06:05,890 and then right-continuity
222 00:06:05,899 –> 00:06:07,540 means that this limit is
223 00:06:07,549 –> 00:06:09,209 always F_X at the
224 00:06:09,220 –> 00:06:10,369 point x_0
225 00:06:11,220 –> 00:06:12,670 and this is the property
226 00:06:12,679 –> 00:06:14,010 we have for the CDF
227 00:06:14,119 –> 00:06:15,040 F_X.
228 00:06:15,850 –> 00:06:17,570 Now the meaning of this in
229 00:06:17,579 –> 00:06:19,000 the graph, you can see
230 00:06:19,010 –> 00:06:20,839 above. At each
231 00:06:20,850 –> 00:06:22,510 jump point as this one
232 00:06:22,630 –> 00:06:24,279 the filled in circle has to
233 00:06:24,290 –> 00:06:25,579 lie at the upper part on
234 00:06:25,589 –> 00:06:26,149 the right.
235 00:06:27,239 –> 00:06:27,690 OK.
236 00:06:27,700 –> 00:06:28,700 I don’t want to prove the
237 00:06:28,709 –> 00:06:30,420 fact here, but you should
238 00:06:30,429 –> 00:06:31,640 see the reason for it.
239 00:06:31,859 –> 00:06:33,649 We include x here in
240 00:06:33,660 –> 00:06:34,570 this interval
241 00:06:35,230 –> 00:06:36,600 and the jump in the graph
242 00:06:36,609 –> 00:06:38,390 would mean that the singleton,
243 00:06:38,399 –> 00:06:40,170 just with the point x, has
244 00:06:40,179 –> 00:06:41,609 a non-zero probability
245 00:06:42,299 –> 00:06:43,519 and this non-zero
246 00:06:43,540 –> 00:06:45,070 probability is then
247 00:06:45,079 –> 00:06:46,450 included in the whole
248 00:06:46,459 –> 00:06:47,540 probability here.
249 00:06:48,440 –> 00:06:48,970 OK.
250 00:06:48,980 –> 00:06:50,640 Then I think we can look
251 00:06:50,649 –> 00:06:51,829 at an example.
252 00:06:52,670 –> 00:06:54,040 In fact, this will be one
253 00:06:54,049 –> 00:06:55,760 of the most important examples
254 00:06:55,769 –> 00:06:57,750 we have. We take the
255 00:06:57,760 –> 00:06:59,579 random variable X which should
256 00:06:59,589 –> 00:07:01,359 have the distribution given
257 00:07:01,369 –> 00:07:03,119 by the normal distribution.
258 00:07:03,649 –> 00:07:04,920 If you have already heard
259 00:07:04,929 –> 00:07:06,320 of the normal distribution,
260 00:07:06,380 –> 00:07:07,950 then you know it has two
261 00:07:07,959 –> 00:07:08,880 parameters
262 00:07:09,579 –> 00:07:10,799 and the simplest case would
263 00:07:10,809 –> 00:07:12,230 be that we set the mean to
264 00:07:12,239 –> 00:07:14,040 be 0 and the standard
265 00:07:14,049 –> 00:07:15,720 deviation to be 1.
266 00:07:16,459 –> 00:07:17,959 If you have never seen this,
267 00:07:17,970 –> 00:07:19,640 this is not a problem, because
268 00:07:19,649 –> 00:07:20,940 now we discuss it.
269 00:07:21,779 –> 00:07:23,359 It’s an absolutely continuous
270 00:07:23,369 –> 00:07:24,970 case, which means the whole
271 00:07:24,980 –> 00:07:26,799 probability measure is given
272 00:07:26,809 –> 00:07:28,269 by a probability density
273 00:07:28,279 –> 00:07:28,799 function
274 00:07:29,640 –> 00:07:31,339 and indeed, this one is
275 00:07:31,350 –> 00:07:32,859 given by the famous bell
276 00:07:32,869 –> 00:07:33,500 curve.
277 00:07:34,329 –> 00:07:35,799 It’s not just any bell
278 00:07:35,809 –> 00:07:37,429 curve, it’s given by the
279 00:07:37,440 –> 00:07:38,630 Gaussian function.
280 00:07:39,390 –> 00:07:40,980 Which means it’s given by
281 00:07:40,989 –> 00:07:42,739 e to the power minus
282 00:07:42,750 –> 00:07:44,299 1/2 * x
283 00:07:44,309 –> 00:07:44,850 squared
284 00:07:45,980 –> 00:07:47,339 and the correct normalization
285 00:07:47,350 –> 00:07:48,829 in front would be given by
286 00:07:48,839 –> 00:07:50,679 one divided by the square
287 00:07:50,690 –> 00:07:52,480 root of two times pi.
288 00:07:53,260 –> 00:07:54,839 Now, in general, in the case,
289 00:07:54,850 –> 00:07:56,220 when we have the probability
290 00:07:56,230 –> 00:07:57,519 density function, the
291 00:07:57,529 –> 00:07:59,480 PDF, we can easily
292 00:07:59,489 –> 00:08:01,140 calculate the CDF.
293 00:08:01,929 –> 00:08:03,209 In fact, the only thing we
294 00:08:03,220 –> 00:08:05,170 have to do is to integrate
295 00:08:05,179 –> 00:08:06,529 over the PDF,
296 00:08:07,089 –> 00:08:08,690 which means we go from minus
297 00:08:08,700 –> 00:08:10,209 infinity to the point
298 00:08:10,220 –> 00:08:10,809 x
299 00:08:11,549 –> 00:08:12,779 and then we only have to
300 00:08:12,790 –> 00:08:14,529 put in the PDF where the
301 00:08:14,540 –> 00:08:16,350 constant is written in front
302 00:08:16,359 –> 00:08:17,190 of the integral.
303 00:08:17,730 –> 00:08:19,500 However, here, because x
304 00:08:19,510 –> 00:08:21,410 is already chosen as available,
305 00:08:21,420 –> 00:08:22,570 we need another one.
306 00:08:22,579 –> 00:08:23,850 So let’s choose t.
307 00:08:24,640 –> 00:08:26,500 Now, for the normal distribution,
308 00:08:26,510 –> 00:08:28,279 we can not simplify this
309 00:08:28,290 –> 00:08:29,720 integral immediately,
310 00:08:29,799 –> 00:08:31,100 but of course, we can draw
311 00:08:31,160 –> 00:08:31,839 the graph.
312 00:08:32,640 –> 00:08:34,099 So we would start with minus
313 00:08:34,109 –> 00:08:35,859 infinity, then we would increase
314 00:08:35,869 –> 00:08:36,700 and increase
315 00:08:36,849 –> 00:08:38,419 and in the end in the limit,
316 00:08:38,429 –> 00:08:39,200 we would go to
317 00:08:39,210 –> 00:08:41,179 +1. One important
318 00:08:41,190 –> 00:08:42,500 property you see here is,
319 00:08:42,510 –> 00:08:44,380 because the PDF is
320 00:08:44,390 –> 00:08:46,179 symmetric, we will hit
321 00:08:46,190 –> 00:08:47,219 one half here.
322 00:08:48,219 –> 00:08:48,679 OK.
323 00:08:48,690 –> 00:08:50,280 Because my drawing here is
324 00:08:50,289 –> 00:08:51,770 not perfect at all.
325 00:08:51,780 –> 00:08:53,320 I would suggest that we now
326 00:08:53,330 –> 00:08:54,150 go into
327 00:08:54,160 –> 00:08:55,359 Rstudio.
328 00:08:56,650 –> 00:08:57,140 Here
329 00:08:57,150 –> 00:08:58,130 the first thing we should
330 00:08:58,140 –> 00:09:00,010 check is the help function.
331 00:09:00,020 –> 00:09:01,219 So what is the normal
332 00:09:01,229 –> 00:09:02,780 distribution in R?
333 00:09:03,750 –> 00:09:05,450 Indeed here you see we have
334 00:09:05,460 –> 00:09:06,229 everything.
335 00:09:06,299 –> 00:09:07,679 So the whole explanation
336 00:09:07,690 –> 00:09:09,309 what the normal distribution
337 00:09:09,320 –> 00:09:09,809 is.
338 00:09:10,469 –> 00:09:12,239 Most importantly, we see
339 00:09:12,250 –> 00:09:13,630 the probability density
340 00:09:13,640 –> 00:09:15,330 function is here given
341 00:09:15,340 –> 00:09:16,570 with dnorm.
342 00:09:17,239 –> 00:09:18,940 On the other hand the CDF,
343 00:09:18,950 –> 00:09:20,400 the distribution
344 00:09:20,409 –> 00:09:22,020 function is given by
345 00:09:22,030 –> 00:09:22,890 pnorm
346 00:09:23,809 –> 00:09:25,649 and the last one here,
347 00:09:25,655 –> 00:09:27,179 rnorm will just give us a
348 00:09:27,190 –> 00:09:28,869 sample with respect to the
349 00:09:28,880 –> 00:09:29,989 normal distribution.
350 00:09:30,919 –> 00:09:32,460 Of course, in addition, you
351 00:09:32,469 –> 00:09:34,380 also get a lot of explanations
352 00:09:34,390 –> 00:09:34,950 here.
353 00:09:34,989 –> 00:09:36,609 Most importantly, you
354 00:09:36,619 –> 00:09:38,419 see the definition
355 00:09:38,429 –> 00:09:39,789 of the probability density
356 00:09:39,799 –> 00:09:41,549 function is also given
357 00:09:41,559 –> 00:09:41,900 here.
358 00:09:42,570 –> 00:09:44,299 However, you see the parameter
359 00:09:44,309 –> 00:09:45,979 sigma and the parameter
360 00:09:45,989 –> 00:09:47,590 mu are included here.
361 00:09:48,320 –> 00:09:49,830 Nevertheless, the default
362 00:09:49,840 –> 00:09:51,460 values are the same as we
363 00:09:51,469 –> 00:09:52,159 have chosen.
364 00:09:52,940 –> 00:09:53,440 OK.
365 00:09:53,450 –> 00:09:54,880 Then I would say we can play
366 00:09:54,890 –> 00:09:56,640 around and plot the CDF
367 00:09:56,650 –> 00:09:58,190 and the PDF.
368 00:09:58,849 –> 00:10:00,229 Now for plotting, we need
369 00:10:00,239 –> 00:10:01,309 the whole sequence.
370 00:10:01,320 –> 00:10:03,150 So let’s set x to the
371 00:10:03,159 –> 00:10:04,849 sequence where we start
372 00:10:04,859 –> 00:10:06,690 maybe with minus 10
373 00:10:06,799 –> 00:10:08,750 and go to plus 10 with
374 00:10:08,760 –> 00:10:09,390 step size
375 00:10:09,400 –> 00:10:11,390 0.01.
376 00:10:12,390 –> 00:10:13,489 Let’s hit enter
377 00:10:13,500 –> 00:10:14,909 and then we see the result
378 00:10:14,919 –> 00:10:15,390 here.
379 00:10:16,340 –> 00:10:18,010 Indeed, we recognize that
380 00:10:18,020 –> 00:10:19,710 we have 2001
381 00:10:19,719 –> 00:10:20,409 points.
382 00:10:21,169 –> 00:10:22,549 Then in the next step, we
383 00:10:22,559 –> 00:10:24,070 can put all these points
384 00:10:24,080 –> 00:10:25,669 into the density function
385 00:10:25,719 –> 00:10:27,439 which was given by
386 00:10:27,449 –> 00:10:28,190 dnorm.
387 00:10:31,039 –> 00:10:31,489 OK.
388 00:10:31,500 –> 00:10:32,739 So here you see the best
389 00:10:32,750 –> 00:10:34,250 thing would be to enlarge
390 00:10:34,260 –> 00:10:35,140 the whole picture.
391 00:10:36,309 –> 00:10:38,140 Now, in this picture, we
392 00:10:38,150 –> 00:10:40,080 recognize our bell curve.
393 00:10:40,849 –> 00:10:42,250 However, maybe it’s better
394 00:10:42,260 –> 00:10:43,450 to not just plot the
395 00:10:43,460 –> 00:10:45,200 values but also the
396 00:10:45,210 –> 00:10:46,359 points x.
397 00:10:46,369 –> 00:10:47,809 So we put in x
398 00:10:47,820 –> 00:10:49,750 comma dnorm of x
399 00:10:51,090 –> 00:10:52,409 and there you see, we have
400 00:10:52,419 –> 00:10:54,309 our peak exactly at the point
401 00:10:54,320 –> 00:10:55,940 x is equal to zero.
402 00:10:56,729 –> 00:10:57,309 OK.
403 00:10:57,330 –> 00:10:58,960 Then I would say in the same
404 00:10:58,969 –> 00:11:00,030 way we can plot the
405 00:11:00,040 –> 00:11:01,289 CDF.
406 00:11:01,340 –> 00:11:02,630 So we take x
407 00:11:02,640 –> 00:11:04,510 comma pnorm
408 00:11:04,520 –> 00:11:05,429 of x.
409 00:11:07,880 –> 00:11:09,460 So here you see this is like
410 00:11:09,469 –> 00:11:10,809 our picture from before,
411 00:11:10,820 –> 00:11:12,739 but now from minus 10 to
412 00:11:12,750 –> 00:11:13,500 plus 10.
413 00:11:14,340 –> 00:11:15,679 Hence, you can see in the
414 00:11:15,690 –> 00:11:17,679 limits we go from zero
415 00:11:17,809 –> 00:11:18,900 to plus one.
416 00:11:19,770 –> 00:11:21,270 Now I would suggest that
417 00:11:21,280 –> 00:11:22,460 you play around with all
418 00:11:22,469 –> 00:11:24,010 these functions to get a
419 00:11:24,020 –> 00:11:25,429 good visualization for the
420 00:11:25,440 –> 00:11:26,750 PDF and for the
421 00:11:26,760 –> 00:11:27,549 CDF.
422 00:11:28,479 –> 00:11:28,840 OK.
423 00:11:28,849 –> 00:11:30,380 Now, at the end of the video,
424 00:11:30,390 –> 00:11:32,030 we can also look at the function
425 00:11:32,039 –> 00:11:33,099 rnorm
426 00:11:34,440 –> 00:11:36,059 and if we put in a number,
427 00:11:36,070 –> 00:11:37,940 we get as many samples as
428 00:11:37,950 –> 00:11:38,619 we want.
429 00:11:39,979 –> 00:11:41,159 So in this case, we took
430 00:11:41,169 –> 00:11:42,530 10 but of course, we could
431 00:11:42,539 –> 00:11:44,479 also take 6000 for
432 00:11:44,489 –> 00:11:45,080 example.
433 00:11:46,250 –> 00:11:47,950 So showing all of them is
434 00:11:47,960 –> 00:11:49,469 not so exciting, but
435 00:11:49,479 –> 00:11:51,059 maybe a histogram is.
436 00:11:51,799 –> 00:11:53,659 So let’s put in histogram
437 00:11:53,669 –> 00:11:55,359 of rnorm
438 00:11:55,510 –> 00:11:56,820 6000
439 00:11:59,219 –> 00:12:00,739 and here again, we
440 00:12:00,750 –> 00:12:02,090 recognize some bell
441 00:12:02,099 –> 00:12:02,760 curve.
442 00:12:03,450 –> 00:12:05,109 So maybe we can just repeat
443 00:12:05,119 –> 00:12:06,200 the whole thing again
444 00:12:07,380 –> 00:12:09,159 and as you can see, we don’t
445 00:12:09,169 –> 00:12:10,109 change so much.
446 00:12:11,109 –> 00:12:12,729 However, maybe we can
447 00:12:12,739 –> 00:12:14,500 also increase the number.
448 00:12:16,429 –> 00:12:17,789 In another video, I already
449 00:12:17,799 –> 00:12:19,289 showed you how you can make
450 00:12:19,299 –> 00:12:20,489 a histogram a little bit
451 00:12:20,500 –> 00:12:22,349 nicer and maybe
452 00:12:22,359 –> 00:12:23,390 this is something you should
453 00:12:23,400 –> 00:12:24,500 do here as well.
454 00:12:25,349 –> 00:12:25,849 OK.
455 00:12:25,859 –> 00:12:27,169 Now, I think that’s good
456 00:12:27,179 –> 00:12:28,530 enough for this video.
457 00:12:28,700 –> 00:12:30,289 I really hope you now know
458 00:12:30,299 –> 00:12:31,830 what a normal distribution
459 00:12:31,840 –> 00:12:33,450 is and also what a
460 00:12:33,460 –> 00:12:35,010 general CDF is.
461 00:12:35,789 –> 00:12:37,169 Of course, all questions
462 00:12:37,179 –> 00:12:38,640 you can put into the comments
463 00:12:38,780 –> 00:12:40,429 and then I hope I see you
464 00:12:40,440 –> 00:12:41,619 in the next video.
465 00:12:41,780 –> 00:12:42,559 Bye.
-
Quiz Content
Q1: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. What is the definition of $F_X$?
A1: $F_X(x) = \mathbb{P}(X \leq x)$
A2: $F_X(x) = \mathbb{P}(X(x))$
A3: $F_X(x) = \mathbb{P}(X \geq x)$
A4: $F_X(x) = \mathbb{P}(X \in x)$
A5: $F_X(x) = \mathbb{P}(X \notin x)$
Q2: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. How do we call $F_X$?
A1: Probability density function of $X$.
A2: Probability distribution of $X$.
A3: Cumulative distribution function of $X$.
A4: Measure space of $X$.
Q3: Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space and $X \colon \Omega \rightarrow \mathbb{R}$ be a random variable. What is not a property of the CDF $F_X$ in general?
A1: It is a real-valued function.
A2: It is function with codomain $[0,1]$.
A3: It is a right-continuous function.
A4: It is a bounded function.
A5: It is a monotonically increasing function.
A6: It is a continuous function.
Q4: What is a possible outcome of the following R code? $$\texttt{rnorm(3)}$$
A1: $\texttt{3, 4}$
A2: $\texttt{1, 2, 3}$
A3: $\texttt{-0.09301465, 0.27320039, -1.6674600111}$
A4: $\texttt{3.0, 3.0}$
A5: $\texttt{3, 3, 3, 3}$
-
Last update: 2024-10