Monte Carlo Integration Digital g Image g Synthesis y Yung-Yu Chuang with slides by Pat Hanrahan and Torsten Moller Introduction • The integral equations generally don’t have analytic solutions solutions, so we must turn to numerical methods. • Standard St d d methods th d lik like T Trapezoidal id l iintegration t ti or Gaussian quadrature are not effective for hi h di high-dimensional i l and d di discontinuous ti iintegrals. t l Lo (p, ωo ) Le (p, ωo ) 2 f (p, ωo , ωi )Li (p, ωi ) cos θ i dωi s Numerical quadrature b • Suppose we want to calculate I a f ( x)dx , but can’tt solve it analytically. can analytically The approximations through quadrature rules have the form n Iˆ wi f ( xi ) i 1 which is essentially the weighted sum of samples p of the function at various p points Midpoint rule convergence Trapezoid rule convergence Simpson’s rule • Similar to trapezoid but using a quadratic polynomial approximation convergence assuming f has a continuous fourth derivative. Curse of dimensionality and discontinuity • For an sd f function i f, f • If the 1d rule has a convergence rate of O(n-r), the sd rule would require a much larger number (ns) of samples to work as well as the 1d one. Thus, the convergence rate is only O(n-r/s). g is O(n ( -1/s) for • If f is discontinuous,, convergence sd. Randomized algorithms • Las Vegas v.s. Monte Carlo • Las L V Vegas: always l gives i the h right i h answer b by using randomness. • Monte Carlo: gives the right answer on the average. Results depend on random numbers used, but statistically likely to be close to the right answer. Monte Carlo integration • Monte Carlo integration: uses sampling to estimate the values of integrals integrals. It only requires to be able to evaluate the integrand at arbitrary points, points making it easy to implement and applicable to many problems. • If n samples l are used, d it its converges att th the rate t of O(n-1/2). That is, to cut the error in half, it is necessary to t evaluate l t ffour ti times as many samples. • Images by Monte Carlo methods are often noisy. Most current methods try to reduce noise. Monte Carlo methods • Advantages – E Easy tto iimplement l t – Easy to think about (but be careful of statistical bias) – Robust R b t when h used d with ith complex l iintegrands t d and d domains (shapes, lights, …) – Efficient for high dimensional integrals • Disadvantages – Noisy – Slow (many samples needed for convergence) Basic concepts • X is a random variable • Applying A l i a function f i to a random d variable i bl gives i another random variable, Y=f(X). • CDF (cumulative distribution function) P ( x) Pr{{ X x} • PDF (probability density function): nonnegative, sum to 1 dP ( x) p( x) dx • canonical i l uniform if random d variable i bl ξ ((provided id d by standard library and easy to transform to other th di distributions) t ib ti ) Discrete probability distributions • Discrete events Xi with probability pi pi 0 n p i 1 i 1 pi • Cumulative PDF (distribution) j 1 Pj pi i 1 U • Construction of samples: To randomly select an event, Select Xi if Pi 1 U Pi Uniform random variable Pi 0 X3 Continuous probability distributions • PDF p( x ) Uniform p( x ) 0 • CDF P( x ) 1 x P(1) 1 P( x ) p( x )dx 0 P( x ) Pr( X x ) P ( X ) p( x ) dx Pr( d P( ) P( ) 0 0 1 Expected values • Average value of a function f(x) over some distribution of values p(x) over its domain D E p f ( x) f ( x) p ( x)dx D • Example: p cos function over [[0,, π], ], p is uniform 1 0 E p cos( x) cos x dx 0 Variance • Expected deviation from the expected value • Fundamental F d l concept off quantifying if i the h error in Monte Carlo methods V [ f ( x)] E f ( x) E f ( x) 2 Properties E af ( x) aE f ( x) E f ( X i ) E f ( X i ) i i V af ( x) a 2V f ( x) V f ( x) E f ( x) E f ( x) 2 2 Monte Carlo estimator • Assume that we want to evaluate the integral of f(x) over [a,b] • Given Gi a uniform if random d variable Xi over [a,b], M t C Monte Carlo l estimator ti t ba N FN f (Xi) N i 1 says that the expected value E[FN] of the estimator FN equals the integral General Monte Carlo estimator • Given a random variable X drawn from an arbitrary PDF p(x), p(x) then the estimator is 1 FN N N i 1 f (Xi) p( X i ) • Although the converge rate of MC estimator is ( 1/2)), slower than other integral methods, its O(N converge rate is independent of the dimension, making it the only practical method for high dimensional integral Convergence of Monte Carlo • Chebyshev’s inequality: let X be a random variable with expected value μ and variance σ2. For any real number k>0, 1 Pr{| X | k } 2 k • For example, for k= 2 , it shows that at least half of the value lie in the interval ( 2 , 2 ) • Let Yi f ( X i ) / p( X i ) , the MC estimate FN becomes 1 FN N N Y i 1 i Convergence of Monte Carlo • According to Chebyshev’s inequality, 1 2 V [ F ] N Pr | FN E[ FN ] | 1 V [ FN ] V N 1 N 1 Yi 2 V Yi 2 i 1 N i 1 N N N 1 V Yi V Y N i 1 • Plugging into Chebyshev’s inequality, inequality 1 1 V [Y ] 2 Pr | FN I | N So,, for a fixed threshold,, the error decreases at the rate N-1/2. Properties of estimators • An estimator FN is called unbiased if for all N E[ FN ] Q That is, the expected value is independent of N. • Otherwise, the bias of the estimator is defined as [ FN ] E[ FN ] Q • If the bias goes to zero as N increases, increases the estimator is called consistent li [ FN ] 0 lim N lim E[ FN ] Q N Example of a biased consistent estimator • Suppose we are doing antialiasing on a 1d pixel, to determine the pixel value, value we need to 1 evaluate I w( x) f ( x)dx , where w(x) is the 1 0 filter function with w( x)dx d 1 0 • A common way to evaluate this is N w( X i ) f ( X i ) i 1 FN N i 1 w( X i ) • When N=1, we have 1 w( X 1 ) f ( X 1 ) E[ F1 ] E E[ f ( X 1 )] 0 f ( x)dx I w( X 1 ) Example of a biased consistent estimator • When N=2, we have 1 1 w( x ) f ( x ) w( x ) f ( x ) 1 1 2 2 E[ F2 ] dx1dx2 I 0 0 w( x1 ) w( x2 ) • However, when N is very large, the bias approaches pp to zero 1 FN N 1 N N i 1 w( X i ) f ( X i ) N i 1 w( X i ) 1 N li lim i 1 w( X i ) f ( X i ) N N lim E[ FN ] N 1 N lim i 1 w( X i ) N N 1 w( x) f ( x)dx w( x) f ( x)dx I w( x)dx 1 0 1 0 0 Choosing samples 1 N f (Xi) FN N i 1 p ( X i ) • Carefully choosing the PDF from which samples are drawn is an important technique to reduce variance. We want the f/p to have a low variance Hence variance. Hence, it is necessary to be able to draw samples from the chosen PDF. • How to sample an arbitrary distribution from a variable of uniform distribution? • – Inversion ve s o – Rejection – Transform Inversion method • Cumulative probability distribution function P( x ) Pr( X x ) • Construction of samples Solve for X X=P P-1(U) 1 U • Must know: 1. The integral of p(x) 2. The inverse function P-11(x) 0 X Proof for the inversion method • Let U be an uniform random variable and its CDF is Pu(x)=x. (x) x We will show that Y=P Y P-1(U) has the CDF P(x). Proof for the inversion method • Let U be an uniform random variable and its CDF is Pu(x)=x. (x) x We will show that Y=P Y P-1(U) has the CDF P(x). PrY x Pr P 1 (U ) x PrU P ( x) Pu ( P ( x)) P ( x) because P is monotonic, P( x1 ) P( x2 ) if x1 x2 Thus, Y’s CDF is exactly P(x). Inversion method • Compute CDF P(x) • Compute P-1(x) • Obtain ξ • Compute Xi=P-1(ξ) Example: power function It is used in sampling Blinn’s microfacet model. p( x) x n Example: power function It is used in sampling Blinn’s microfacet model. • Assume A 1 p( x ) ( n 1) x n n 1 1 x 1 0 x dx n 1 n 1 0 n P( x ) x n 1 X ~ p( x ) X P 1 (U ) n 1 U Trick (It only works for sampling power distribution) Y max(U1, U 2 ,, U n , U n 1 ) n1 Pr(Y x ) Pr(U x ) x n 1 i 1 Example: exponential distribution p ( x) ce ax useful for rendering participating media. • Compute CDF P(x) • Compute P-1(x) • Obtain Obt i ξ • Compute Xi=P-1(ξ) Example: exponential distribution p ( x) ce ax 0 ce ax useful for rendering participating media. dx 1 • Compute CDF P(x) ca x P( x) ae ds 1 e as ax 0 • Compute P-1(x) 1 P ( x) ln(1 x) a • Obtain Obt i ξ 1 1 -1 • Compute Xi=P (ξ) X ln((1 ) ln a a 1 Rejection method • Sometimes, we can’t integrate into CDF or invert CDF 1 I f ( x ) dx 0 dx dy y f ( x) y f ( x ) • Algorithm Pick U1 and U2 Accept U1 if U2 < f(U1) • Wasteful? Efficiency = Area / Area of rectangle Rejection method • Rejection method is a dart-throwing method without performing integration and inversion. inversion 1. Find q(x) so that p(x)<Mq(x) 2. Dart throwing a. Choose a pair (X, ξ), where X is sampled from q(x) b. If (ξ<p(X)/Mq(X)) return X • Equivalently, qu vale tly, we p pick c a point (X, ξMq(X)). If it lies beneath p(X) (X) then we are fine. Why it works • For each iteration, we generate Xi from q. The sample is returned if ξ<p(X)/Mq(X), ξ<p(X)/Mq(X) which happens with probability p(X)/Mq(X). • So, S the th probability b bilit tto return t x is i p( x) p( x) q( x) Mq( x) M whose integral is 1/M • Thus, when a sample is returned (with probability 1/M), 1/M) Xi is distributed according to p(x). Example: sampling a unit disk void RejectionSampleDisk(float *x, float *y) { float sx, , sy; y; do { sx = 1.f -2.f * RandomFloat(); sy = 1.f -2.f * RandomFloat(); } while (sx*sx + sy*sy > 1.f) *x = sx; *y = sy; } π/4~78.5% good samples, gets worse in higher dimensions, for example, for sphere, π/6 π/6~52.4% 52.4% Transformation of variables • Given a random variable X from distribution px(x) to a random variable Y=y(X), Y y(X) where y is one-toone to one, i.e. monotonic. We want to derive the distribution of Y, Y py(y). (y) • Py ( y ( x)) Pr{Y y ( x)} Pr{ X x} Px ( x) Px(x) • PDF: dP ( y ( x)) dP ( x) y dx dyy dPy ( y ) dyy p y ( y) dx dy dx x x Py(y) dx p x (x) 1 dyy p y ( y ) p x ( x) dx y Example p x ( x) 2 x Y sin X 1 2x 2 sin y p y ( y ) (cos x) p x ( x) 2 cos x 1 y 1 Transformation method • A problem to apply the above method is that we usually have some PDF to sample from from, not a given transformation. • Given Gi a source random d variable i bl X with ith px(x) ( ) and d a target distribution py(y), try transform X into t another to th random d variable i bl Y so that th t Y has h th the distribution py(y). • We first have to find a transformation y(x) so that Px(x)=Py(y(x)). Thus, 1 y y ( x) P ( Px ( x)) Transformation method • Let’s prove that the above transform works. W first We fi prove that h the h random d variable i bl Z= Px(x) ( ) has a uniform distribution. If so, then Py1 ( Z ) should have distribution Px(x) from the inversion method. PrZ x PrPx ( X ) x PrX Px1 ( x) Px ( Px1 ( x)) x Thus Z is uniform and the transformation works. Thus, works • It is an obvious generalization of the inversion method in which X is uniform and Px(x)=x. method, (x)=x Example p x ( x) x y p y ( y) e y Example p x ( x) x 2 x Px ( x) 2 y p y ( y) e y Py ( y ) e y Py1 ( y ) ln y 2 x y ( x) P ( Px ( x)) ln( ) 2 ln x ln 2 2 1 y Thus, if X has the distribution p x ( x) x , then the random variable Y 2 ln X ln 2 has the distribution p y ( y) e y Multiple dimensions We often need the other way around, Spherical coordinates • The spherical coordinate representation of i cos directions is x r sin y r sin sin z r cos | J T | r 2 sin p (r , , ) r 2 sin p ( x, y, z ) Spherical coordinates • Now, look at relation between spherical directions and a solid angles d sin dd , p ( , )dd p ( )d • Hence, the density in terms of p ( , ) sin p ( ) Multidimensional sampling • Separable case: independently sample X from px and Y from py. p(x,y)=p p(x y) px(x)py(y) • Often, this is not possible. Compute the marginal i l density d it ffunction ti p(x) ( ) first. fi t p ( x) p ( x, y )dy • Then, Then compute the conditional density function p ( x, y ) p ( y | x) p ( x) • Use 1D sampling with p(x) and p(y|x). Sampling a hemisphere • Sample a hemisphere uniformly, i.e. p( ) c 1 c 2 1 p ( ) • Sample first 2 2 1 p ( ) 2 sin p ( , ) 2 sin p ( ) p( , )d d sin 2 0 0 • Now sampling p ( , ) 1 p ( | ) p ( ) 2 Sampling a hemisphere • Now, we use inversion technique in order to sample the PDF PDF’ss P( ) sin ' d ' 1 cos 0 1 P( | ) d ' 2 2 0 • Inverting these: cos 1 1 2 2 Sampling a hemisphere • Convert these to Cartesian coordinate cos 1 2 2 1 x sin cos cos(21 ) 1 12 y sin sin sin( 21 ) 1 2 1 z cos 1 • Similar Si il derivation d i ti ffor a ffull ll sphere h Sampling a disk WRONG Equi Equi-Areal Areal RIGHT Equi Equi-Areal Areal 2 U1 2 U1 r U2 r U2 Sampling a disk WRONG Equi Equi-Areal Areal RIGHT Equi Equi-Areal Areal 2 U1 2 U1 r U2 r U2 Sampling a disk • Uniform p( x, y ) 1 • Sample r first. • Then, Then sample . • Invert the CDF. p (r , ) rp ( x, y ) r 2 p(r ) p ( r , ) d 2 r 0 p(r , ) 1 p ( | r ) p (r (r ) 2 P(r ) r r 1 2 P ( | r ) 2 2 2 Shirley’s mapping r U1 U2 4 U1 Sampling a triangle u0 v0 uv 1 (u,1 u) v u A 1 0 1u 0 p(u, v) 2 uv 1 (1 u) dv du (1 u) du 0 2 1 2 1 0 1 2 Sampling a triangle • Here u and v are not independent! p(u, v) 2 • Conditional probability p (u ) p (u , v)dv d 1 u p (u , v) p (u | v) p (v ) p(u) 2 dv d 2(1 u) 0 u0 P(u0 ) 2(1 u) du (1 u0 )2 u0 1 U1 0 1 p(v | u) (1 u) vo vo P(v0 | u0 ) p(v | u0 ) dv 0 0 v0 U1U 2 v0 1 dv (1 u0 ) (1 u0 ) Cosine weighted hemisphere p ( ) cos 1 p ( )d 1 2 0 c cos sin dd 2 0 1 c 2 cos sin d 2 0 c 1 p ( , ) 1 cos sin d sin dd Cosine weighted hemisphere p ( , ) 1 cos sin 2 1 0 p ( ) cos sin i d 2 cos sin i sin i 2 p ( , ) 1 p ( | ) 2 p ( ) 1 1 P ( ) cos 2 1 2 2 P ( | ) 2 2 1 cos 1 (1 21 ) 2 2 2 Cosine weighted hemisphere • Malley’s method: uniformly generates points on the unit disk and then generates directions by projecting them up to the hemisphere above it. Vector CosineSampleHemisphere(float u1 u1,float float u2){ Vector ret; ConcentricSampleDisk(u1 u2, ConcentricSampleDisk(u1, u2 &ret.x, &ret x &ret.y); &ret y); ret.z = sqrtf(max(0.f,1.f - ret.x*ret.x ret.y ret.y)); ret.y*ret.y)); return ret; } r Cosine weighted hemisphere • Why does Malley’s method work? r p(r , ) • Unit U i di disk k sampling li • Map to hemisphere (r , ) (sin , ) T Y (r , ) X ( , ) r sin i p y (T ( x)) J T ( x) 1 cos J T ( x) 0 0 cos 1 p x ( x) Cosine weighted hemisphere T Y (r , ) X ( , ) r sin p y (T ( x)) J T ( x) 1 cos J T ( x) 0 0 cos 1 p x ( x) p ( , ) J T p (r , ) cos sin Sampling Phong lobe p ( ) cos n 2 / 2 c cos p ( ) c cos n n 0 0 2c 0 cos n d cos 1 cos 1 n 1 c 2 n 1 n p ( , ) cos sin 2 sin dd 1 2c 1 n 1 Sampling Phong lobe n 1 n cos sin 2 2 n 1 n p ( ) cos sin d (n 1) cos n sin 0 2 p ( , ) P( ' ) ' n ( n 1 ) cos sin d 0 ' (n 1) cos n d cos (n 1) 0 1 cos n 1 ' cos 1 n 1 1 cos n 1 n 1 cos ' cos 1 Sampling Phong lobe n 1 p ( , ) cos n sin 2 n n 1 sin cos p ( , ) 1 2 p ( | ) p ( ) (n 1) cos n sin 2 ' 1 ' P( ' | ) d 2 0 2 2 2 Sampling Phong lobe When n=1, it is actually equivalent to cosine-weighted hemisphere n 1, ( , ) (cos 1 1 ,2 2 ) P( ) 1 cos n 1 1 cos 2 1 1 ( , ) cos (1 21 ),2 2 2 1 1 P ( ) cos 2 2 2 1 1 1 1 2 cos 2 (1 2 sin ) sin 2 1 cos 2 2 2 2 2