![]() |
|
Navigation |
Synopsis Using Rascal to explore an interesting data space.
Description This point of this example is to show how you can use Rascal to explore data.
The problem we will look at comes from mathematics, and has a precise analytical solution, but
let's use Rascal to explore the state space, and see how it can help us to build intuition.
As you know, Rascal supports arbitrarily large numbers cleanly and simply, unlike more traditional languages like C or Java. For example, if you want to computer 1000!, then it's a simple matter of calling fact(1000) at the command line. Let's use this definition of factorial:
public int fact (int n) { if (n <= 1) { return 1; } else { return n * fact (n-1); } }If you compute fact(1000) at the Rascal command line, you get a large number, on the order of 4.02 x 102567. This is much, much bigger than, say a google, which is a mere 10100. (If Rascal runs out stack space, try computing 100!, then 200!, then ... then 1000!; the run-time will allocate more stack space incrementally and automatically if you sneak up to where you want to go).
rascal> fact(1000); int: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000Now copy the numerical result above and paste it into an edit window to have a good look at it. Notice anything interesting? The last 249 digits are all zeros. How did this happen and what does it mean? To be honest, when I did this calculation for the first time, I thought I'd found a bug. So I looked at the values of N! for N in the range 900 to 1000 and discovered that the zeros accumulate on the end of N! as N gets bigger. Let's think about it for a bit: N! is a cumulative product, so once a zero has appeared on the end there is no way to get rid of it by multiplying by a positive integer. How do the zeros appear? Well, this isn't to hard to figure out. Obviously, each time you reach a multiple of 10, you will add (at least) one more zero to the cumulative product. But what about multiples of 5? Well, you would add one more zero if you can match the 5 to a 2 within the factors, and there are lots of lonely 2s in that list. So, to summarize, each time N is a multiple of 5, you add at least one zero onto the cumulative product N!. So here's the question we're going to solve: For an arbitrary N, can you predict exactly how many trailing zeros there will be in N!? Again, this can be solved analytically (and if you go looking on the web, you will discover that this is an old chestnut of a math problem that's sometimes used in job interviews to test analytical ability), but what I want to do here is to show how we can use Rascal to play around with the problem space a bit to help us build up our intuition. This is very much like what we do in empirical software engineering, when we have lots of data to analyze, and we're trying to look for patterns that might explain behaviours, such as why some functions are more likely to be associated with bugs than others. In that kind of situation, we typically go through two stages: first, we wade through the data, exploring relationships, looking for unusual lumps or recognizable patters; second, we build theories of how the world works, and test them out using the data. In both stages, we not only look at the data, we play with it. We build little tools to help answer our questions, see where our hunches lead us. We use this "play" to improve our understanding of the problem space, and build intuition about how it works as testable theories. In empirical software engineering, as in most other sciences, we usually don't get concrete proof of a theory; rather, we gather evidence towards ultimately accepting or rejecting the theories (tho often, we may choose to use this evidence to refine the theories and try again). In this case, however, there is a precise analytical solution, a proof, a "ground truth". But that doesn't mean that we can't use the empirical approach to help build our intuition about the problem space, and ultimately devise a theory about how to calculate the number of trailing zeros in N!. Solving analytical problems is about having enough intuition to see possible solutions. And using this empirical approach is one way to build intuition. So let's define a few helper functions and see where that leads us: public int countZeros (int n) { if (n < 10) { return 0; } else if (n % 10 == 0) { return 1 + countZeros (n / 10); } else { return countZeros (n / 10); } } rascal> int i = fact(1000); int: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 rascal> countZeros(i); int: 472This was my first try at the solution (really!), and there's a problem: 1000! has exactly 249 trailing zeros, not 472. What did I do wrong? Oh, right, trailing zeros, and the above function counts all of the zeros. Let's try again: public int countTrailingZeros (int n) { if (n < 10) { return 0; } else if (n % 10 == 0) { return 1 + countTrailingZeros (n / 10); } else { return 0 ; } } rascal> countTrailingZeros(i); int: 249OK, so we're making progress. Let's define another function to help us explore the data space: public void printLastTwenty (int n){ for(int i <- [n-19..n+1]) { println ("<i>! has <countTrailingZeros(fact(i))> trailing zeros."); } } rascal>printLastTwenty(1000); 981! has 243 trailing zeros. 982! has 243 trailing zeros. 983! has 243 trailing zeros. 984! has 243 trailing zeros. 985! has 244 trailing zeros. 986! has 244 trailing zeros. 987! has 244 trailing zeros. 988! has 244 trailing zeros. 989! has 244 trailing zeros. 990! has 245 trailing zeros. 991! has 245 trailing zeros. 992! has 245 trailing zeros. 993! has 245 trailing zeros. 994! has 245 trailing zeros. 995! has 246 trailing zeros. 996! has 246 trailing zeros. 997! has 246 trailing zeros. 998! has 246 trailing zeros. 999! has 246 trailing zeros. 1000! has 249 trailing zeros. okSo the pattern I see arising (confirmed by more playing that I won't show you) is that you add a zero every time N is divisible by 5. But sometimes you add more than one zero: 1000! adds three zeros. We defined one function above to help us look at the data more compactly; now let's create another function to look for lumps in the data: // Printout all i in [0..n] where i! has more trailing zeros than (i-1)! public void findLumps (int n) { int iMinusOneFactZeros = 0; for (int i <- [1..n+1]) { int iFactZeros = countTrailingZeros(fact(i)); int diff = iFactZeros - iMinusOneFactZeros ; if (diff >= 1) { println ("<diff> more zeros at <i>!"); } iMinusOneFactZeros = iFactZeros; } } rascal>findLumps(1000); 1 more zeros at 5! 1 more zeros at 10! 1 more zeros at 15! 1 more zeros at 20! 2 more zeros at 25! 1 more zeros at 30! 1 more zeros at 35! 1 more zeros at 40! 1 more zeros at 45! 2 more zeros at 50! 1 more zeros at 55! 1 more zeros at 60! 1 more zeros at 65! 1 more zeros at 70! 2 more zeros at 75! 1 more zeros at 80! 1 more zeros at 85! 1 more zeros at 90! 1 more zeros at 95! 2 more zeros at 100! 1 more zeros at 105! 1 more zeros at 110! 1 more zeros at 115! 1 more zeros at 120! 3 more zeros at 125! 1 more zeros at 130! ... 1 more zeros at 245! 3 more zeros at 250! 1 more zeros at 255! 1 more zeros at 495! 3 more zeros at 500! 1 more zeros at 505! ... 1 more zeros at 620! 4 more zeros at 625! 1 more zeros at 630! ... 1 more zeros at 985! 1 more zeros at 990! 1 more zeros at 995! 3 more zeros at 1000! okSo probably we're noticing some patterns here already, and maybe forming some intuition. But let's first revise our lump-finding function to produce even more concise output: // We can parameterize the threshold to look for jumps of 2, 3, or 4 zeros public void findLumps2 (int n, int tao) { int iMinusOneFactZeros = 0; for (int i <- [1..n+1]) { int iFactZeros = countTrailingZeros(fact(i)); int diff = iFactZeros - iMinusOneFactZeros ; if (diff >= tao) { println ("<diff> more zeros at <i>!"); } iMinusOneFactZeros = iFactZeros; } } rascal>findLumps2(1000,2); 2 more zeros at 25! 2 more zeros at 50! 2 more zeros at 75! 2 more zeros at 100! 3 more zeros at 125! 2 more zeros at 150! 2 more zeros at 175! 2 more zeros at 200! 2 more zeros at 225! 3 more zeros at 250! 2 more zeros at 275! ... 2 more zeros at 950! 2 more zeros at 975! 3 more zeros at 1000! ok rascal>findLumps2(1000,3); 3 more zeros at 125! 3 more zeros at 250! 3 more zeros at 375! 3 more zeros at 500! 4 more zeros at 625! 3 more zeros at 750! 3 more zeros at 875! 3 more zeros at 1000! ok rascal>findLumps2(1000,4); 4 more zeros at 625! okNotice anything yet? Here are some fun math facts to consider:
![]() |