Computer automated NES, but not how you think...

Discussion in 'Off Topic Discussion' started by Evotistical, Apr 14, 2013.

  1. Evotistical

    Evotistical Robust Member

    Joined:
    May 25, 2011
    Messages:
    261
    Likes Received:
    4
    Last edited: Apr 14, 2013
  2. -=FamilyGuy=-

    -=FamilyGuy=- Site Supporter 2049

    Joined:
    Mar 3, 2007
    Messages:
    3,034
    Likes Received:
    891
    Nice find sir, very interesting read!

    "The only winning move is not to play!"



    [Edit, as I was inspired by RetroSwim's post below]

    The amount of possible input combinations for a level of super mario bros, is 8^(300*60). (8 possible buttons, inputs are registered every 1/60 second and the timeout is 300 seconds)

    This number is astronomical (~16 kB for the number alone): http://pastebin.com/raw.php?i=AdagzbQf
    (pastebin'd to save you from scrolling down too much)

    I guess his algorithm saves time! (to simulate all that in 100 years, you'd need to be more than ~3.96*10^16248 times the regular speed...)
     
    Last edited: Apr 14, 2013
  3. RetroSwim

    RetroSwim <B>Site Supporter 2013</B><BR><B>Site Supporter 20

    Joined:
    Dec 10, 2012
    Messages:
    605
    Likes Received:
    26
    Er, not really.

    The computer isn't brute forcing the solution to the game. It has metrics for progress (points scored, X-axis position on level).

    With the monkeys on typewriters problem, the monkeys don't know if a particular permutation is more or less like the target phrase. They have no way to know if they're "getting closer", thereby no way to improve. They will literally keep typing for eternity until the correct permutation is stumbled upon.

    In the video you've linked, the program puts together combinations of actions that it knows contribute to progress, and excludes actions that result in a loss of progress. It actually learns to complete the game.

    None the less, it is a very cool video, with very cool computer science behind it.
     
    Last edited: Apr 14, 2013
  4. -=FamilyGuy=-

    -=FamilyGuy=- Site Supporter 2049

    Joined:
    Mar 3, 2007
    Messages:
    3,034
    Likes Received:
    891
    Arguably, he said "long story short" and "a sped up", which seems to indicate he knew the difference but was referring to the monkey problem so people without a big math culture would understand. Also, see my edited post above yours for a nice number, and a less nice one.
     
    Last edited: Apr 14, 2013
  5. RetroSwim

    RetroSwim <B>Site Supporter 2013</B><BR><B>Site Supporter 20

    Joined:
    Dec 10, 2012
    Messages:
    605
    Likes Received:
    26
    That's the thing, I don't think it's "long story short" or "sped up" at all. It's a smarter approach to the monkey problem.

    The monkey problem deals with the probability of them stumbling on the phrase.

    In this case, the AI doesn't have to "stumble" on to the solution for the game, because it knows the conditions for both success and failure. It doesn't have to try every single possible combination of buttons vs time. It is also aware of the outcomes of its actions, unlike the monkeys that are hitting keys at random forever, no way to refine their attack, no "lessons learned" from failed attempts, and so forth.

    You could take the monkeys route, and simulate every single combination of buttons for every single frame of the whole game, until you found one that succeeded. The point of the software in this video is that you don't have to, when the algorithm is self-refining.

    This would be more like taking the monkeys with typewriters, and offering them a reward for every character that they get in the right position, to rule out combinations of letters that will never work.

    It's not simply a "sped up" monkeys problem. It's much more refined. The computer doesn't "random chance" on the solution to the level, it learns steps that result in the conditions for success.

    Also, one small flaw in your calculation.

    The controller has 8 buttons, yes, but more than one of those can be down per retrace. That's 256 possible combinations. So it's 256[SUP](300*60)[/SUP]. :O
     
    Last edited: Apr 15, 2013
  6. -=FamilyGuy=-

    -=FamilyGuy=- Site Supporter 2049

    Joined:
    Mar 3, 2007
    Messages:
    3,034
    Likes Received:
    891
    You're right, I under-estimated the number. But you've overestimated it, you counted A+B and B+A. Also opposite directions can't be pressed at the same time and it's possible not to press any button! Using original controllers of course. I don't know where your 256 comes from, 2^8 ?

    The right number would be the amount of combination of 1 to 6 buttons you can do among 8 possibilities, plus one for no input:
    Sum[8!/(i!*(8 - i)!), {i, 1, 6}] + 1 = 247
    If we drop the start button as it's not very useful in Mario Bros:
    Sum[7!/(i!*(7 - i)!), {i, 1, 5}] +1 = 120

    So the real number of different Mario Bros input in a level is 247^(300*60) for all human-pressable buttons, or 120^(300*60) if you wanna avoid pausing all the time...

    In any case, all these numbers are far greater than the estimated amount of atoms in the universe: 2^300 ...
     
    Last edited: Apr 15, 2013
  7. HEX1GON

    HEX1GON FREEZE! Scumbag

    Joined:
    May 4, 2011
    Messages:
    9,916
    Likes Received:
    837
    Wish I understood maths, and able to be more advanced than I am. :(
    I envy those who can do and make such amazing projects like this.

    It was funny how the AI played Pac-Man and Tetris, since the AI paused right at the end LOL.
     
  8. RetroSwim

    RetroSwim <B>Site Supporter 2013</B><BR><B>Site Supporter 20

    Joined:
    Dec 10, 2012
    Messages:
    605
    Likes Received:
    26
    You're over-complicating it a bit.

    - A+B and B+A are the same, you don't need to specifically account for that. For a controller byte 00000000 to 11111111, A+B and B+A are the same bit pattern: 11. You wouldn't get 00000011 twice, for instance.

    - You don't need to "plus one for no input" because 2^8 is 256 values, from 0 to 255, 00 to FF, 00000000 to 11111111. No buttons to all buttons.

    - This isn't a human using a d-pad. It's software. It can press Up+Down or Left+Right simultaneously. It wouldn't, because of how it detects progress, but it could. That's the point, the software algorithmically rejects combinations that don't fit its criteria for progress. What "makes sense" to us with our human perception is irrelevant.

    Otherwise, yeah, it's all moot because the numbers are so stupidly large! And that's what makes this thing so cool, it can find the best way without having to consider bazillions of possibilities that will never work.
     
    Last edited: Apr 15, 2013
  9. -=FamilyGuy=-

    -=FamilyGuy=- Site Supporter 2049

    Joined:
    Mar 3, 2007
    Messages:
    3,034
    Likes Received:
    891
    I took the physicist approach, you took the computer engineer one. I think we're both right but we'd argue for hours on the subject around a beer. My approach falls back to yours if I consider all 8 or less buttons combinations, mine takes an eternity less to compute than yours though! But it's still an eternity more than the age of the universe, so it's a moot point.

    Trust me I'm a physicist!

    [Edit]
    To be clear I was estimating the amount of calculations required for a "Monkey with a gamepad" approach, not the articles heuristic, which is both interesting and infinitely more efficient than Paco the monkey.
     
    Last edited: Apr 15, 2013
  10. RetroSwim

    RetroSwim <B>Site Supporter 2013</B><BR><B>Site Supporter 20

    Joined:
    Dec 10, 2012
    Messages:
    605
    Likes Received:
    26
    It's cool too see how people's brains things when they come from from different backgrounds, the things they consider to be assumed and whatnot. :)
     
    -=FamilyGuy=- likes this.
  11. wilykat

    wilykat Site Supporter 2013

    Joined:
    Mar 25, 2012
    Messages:
    991
    Likes Received:
    45
    Screw math just get smashed on beer and day old pizza.
     
  12. smf

    smf mamedev

    Joined:
    Apr 14, 2005
    Messages:
    1,255
    Likes Received:
    88
    The only point that it learns is when you train it, that is where it identifies the lexicographic numbers. This is actually the hard part as it needs to figure out two byte (or more) numbers that increase.

    To play the game it picks the next move similar to how you play chess & the video is just a visual representation of the final list of moves that it calculated. When calculating the move list it just runs the code for a certain number of frames using random sequences of moves, finds the best outcome based on how the lexicographic numbers have changed, restores the ram & uses one/some/all of the moves from the sequence, then it starts around the loop again. This stage is what he means by time travel in the introduction.
     
    Last edited: Apr 15, 2013
  13. Evotistical

    Evotistical Robust Member

    Joined:
    May 25, 2011
    Messages:
    261
    Likes Received:
    4
    Buy a Chinese repro nes controller. The kind that don't have the middle pivot divot; and you can press opposing directions at one time, causing mario or luigi to moonwalk in smb3. Dunno what it does in smb1.
     
sonicdude10
Draft saved Draft deleted
Insert every image as a...
  1.  0%

Share This Page