Time difference for random number generation implementation in Java vs. C++ -
i'm writing monte carlo simulation in java involves generating lot of random integers. thinking native code faster random number generation, should write code in c++ , return output via jni. when wrote same method in c++, takes longer execute java version. here code samples:
random rand = new random(); int threshold = 5; int[] composition = {10, 10, 10, 10, 10}; (int j = 0; j < 100000000; j++) { rand.setseed(system.nanotime()); double sum = 0; (int = 0; < composition[0]; i++) sum += carbon(rand); (int = 0; < composition[1]; i++) sum += hydrogen(rand); (int = 0; < composition[2]; i++) sum += nitrogen(rand); (int = 0; < composition[3]; i++) sum += oxygen(rand); (int = 0; < composition[4]; i++) sum += sulfur(rand); if (sum < threshold) {}//execute code else {}//execute other code }
and equivalent code in c++:
int threshold = 5; int composition [5] = {10, 10, 10, 10, 10}; (int = 0; < 100000000; i++) { srand(time(0)); double sum = 0; (int = 0; < composition[0]; i++) sum += carbon(); (int = 0; < composition[1]; i++) sum += hydrogen(); (int = 0; < composition[2]; i++) sum += nitrogen(); (int = 0; < composition[3]; i++) sum += oxygen(); (int = 0; < composition[4]; i++) sum += sulfur(); if (sum > threshold) {} else {} }
all of element methods (carbon, hydrogen, etc) generate random number , return double.
runtimes 77.471 sec java code, , 121.777 sec c++.
admittedly i'm not experienced in c++ it's possible cause badly written code.
i suspect performance issue in bodies of carbon()
, hydrogen()
, nitrogen()
, oxygen()
, , sulfur()
functions. should show how produce random data.
or in if (sum < threshold) {} else {}
code.
i wanted keep setting seed results not deterministic (closer being random)
since you're using result of time(0)
seed you're not getting particularly random results either way.
instead of using srand()
, rand()
should take @ <random>
library , choose engine performance/quality characteristics meed needs. if implementation supports can non-deterministic random data std::random_device
(either generate seeds or engine).
additionally <random>
provides pre-made distributions such std::uniform_real_distribution<double>
better average programmer's method of manually computing distribution want results of rand()
.
okay, here's how can eliminate inner loops code , drastically speed (in java or c++).
your code:
double carbon() { if (rand() % 10000 < 107) return 13.0033548378; else return 12.0; }
picks 1 of 2 values particular probability. presumably intended first value picked 107 times out of 10000 (although using %
rand()
doesn't quite give that). when run in loop , sum results in:
for (int = 0; < composition[0]; i++) sum += carbon();
you'll sum += x*13.0033548378 + y*12.0;
x number of times random number stays under threshold , y (trials-x). happens can simulate running bunch of trials , calculating number of successes using binomial distribution, , <random>
happens provide binomial distribution.
given function sum_trials()
std::minstd_rand0 eng; // global random engine double sum_trials(int trials, double probability, double a, double b) { std::binomial_distribution<> dist(trials, probability); int successes = dist(eng); return successes*a + (trials-successes)*b; }
you can replace carbon()
loop:
sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
i don't have actual values you're using, whole loop like:
(int = 0; < 100000000; i++) { double sum = 0; sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials sum += sum_trials(composition[1], 107.0/10000.0, 13.003354378, 12.0); // hydrogen trials sum += sum_trials(composition[2], 107.0/10000.0, 13.003354378, 12.0); // nitrogen trials sum += sum_trials(composition[3], 107.0/10000.0, 13.003354378, 12.0); // oxygen trials sum += sum_trials(composition[4], 107.0/10000.0, 13.003354378, 12.0); // sulfur trials if (sum > threshold) { } else { } }
now 1 thing note inside function we're constructing distributions on , on same data. can extract replacing function sum_trials()
function object, construct appropriate data once before loop, , use functor repeatedly:
struct sum_trials { std::binomial_distribution<> dist; double a; double b; int trials; sum_trials(int t, double p, double a, double b) : dist{t, p}, a{a}, b{b}, trials{t} {} double operator() () { int successes = dist(eng); return successes * + (trials - successes) * b; } }; int main() { int threshold = 5; int composition[5] = { 10, 10, 10, 10, 10 }; sum_trials carbon = { composition[0], 107.0/10000.0, 13.003354378, 12.0}; sum_trials hydrogen = { composition[1], 107.0/10000.0, 13.003354378, 12.0}; sum_trials nitrogen = { composition[2], 107.0/10000.0, 13.003354378, 12.0}; sum_trials oxygen = { composition[3], 107.0/10000.0, 13.003354378, 12.0}; sum_trials sulfur = { composition[4], 107.0/10000.0, 13.003354378, 12.0}; (int = 0; < 100000000; i++) { double sum = 0; sum += carbon(); sum += hydrogen(); sum += nitrogen(); sum += oxygen(); sum += sulfur(); if (sum > threshold) { } else { } } }
the original version of code took system 1 minute 30 seconds. last version here takes 11 seconds.
here's functor generate oxygen sums using 2 binomial_distributions. maybe 1 of other distributions can in 1 shot don't know.
struct sum_trials2 { std::binomial_distribution<> d1; std::binomial_distribution<> d2; double a; double b; double c; int trials; double probabilty2; sum_trials2(int t, double p1, double p2, double a, double b, double c) : d1{t, p1}, a{a}, b{b}, c{c}, trials{t}, probability2{p2} {} double operator() () { int x = d1(eng); d2.param(std::binomial_distribution<>{trials-x, p2}.param()); int y = d2(eng); return x*a + y*b + (trials-x-y)*c; } }; sum_trials2 oxygen{composition[3], 17.0/1000.0, (47.0-17.0)/(1000.0-17.0), 17.9999, 16.999, 15.999};
you can further speed if can calculate probability sum under threshold
:
int main() { std::minstd_rand0 eng; std::bernoulli_distribution dist(probability_sum_is_over_threshold); (int i=0; i< 100000000; ++i) { if (dist(eng)) { } else { } } }
unless values other elements can negative probability sum greater 5 100%. in case don't need generate random data; execute 'if' branch of code 100,000,000 times.
int main() { (int i=0; i< 100000000; ++i) { //execute code } }
Comments
Post a Comment