2048智能算法讨论

内容纲要

转自:http://blog.codinglabs.org/articles/2048-ai-analysis.html

针对目前火爆的2048游戏,有人实现了一个AI程序,可以以较大概率(高于90%)赢得游戏,并且作者在stackoverflow上简要介绍了AI的算法框架和实现思路。但是这个回答主要集中在启发函数的选取上,对AI用到的核心算法并没有仔细说明。这篇文章将主要分为两个部分,第一部分介绍其中用到的基础算法,即Minimax和Alpha-beta剪枝;第二部分分析作者具体的实现。

基础算法

2048本质上可以抽象成信息对称双人对弈模型(玩家向四个方向中的一个移动,然后计算机在某个空格中填入2或4)。这里“信息对称”是指在任一时刻对弈双方对格局的信息完全一致,移动策略仅依赖对接下来格局的推理。作者使用的核心算法为对弈模型中常用的带Alpha-beta剪枝的Minimax。这个算法也常被用于如国际象棋等信息对称对弈AI中。

Minimax

下面先介绍不带剪枝的Minimax。首先本文将通过一个简单的例子说明Minimax算法的思路和决策方式。

问题

现在考虑这样一个游戏:有三个盘子A、B和C,每个盘子分别放有三张纸币。A放的是1、20、50;B放的是5、10、100;C放的是1、5、20。单位均为“元”。有甲、乙两人,两人均对三个盘子和上面放置的纸币有可以任意查看。游戏分三步:

  1. 甲从三个盘子中选取一个。
  2. 乙从甲选取的盘子中拿出两张纸币交给甲。
  3. 甲从乙所给的两张纸币中选取一张,拿走。

其中甲的目标是最后拿到的纸币面值尽量大,乙的目标是让甲最后拿到的纸币面值尽量小。

下面用Minimax算法解决这个问题。

基本思路

一般解决博弈类问题的自然想法是将格局组织成一棵树,树的每一个节点表示一种格局,而父子关系表示由父格局经过一步可以到达子格局。Minimax也不例外,它通过对以当前格局为根的格局树搜索来确定下一步的选择。而一切格局树搜索算法的核心都是对每个格局价值的评价。Minimax算法基于以下朴素思想确定格局价值:

  • Minimax是一种悲观算法,即假设对手每一步都会将我方引入从当前看理论上价值最小的格局方向,即对手具有完美决策能力。因此我方的策略应该是选择那些对方所能达到的让我方最差情况中最好的,也就是让对方在完美决策下所对我造成的损失最小。
  • Minimax不找理论最优解,因为理论最优解往往依赖于对手是否足够愚蠢,Minimax中我方完全掌握主动,如果对方每一步决策都是完美的,则我方可以达到预计的最小损失格局,如果对方没有走出完美决策,则我方可能达到比预计的最悲观情况更好的结局。总之我方就是要在最坏情况中选择最好的。

上面的表述有些抽象,下面看具体示例。

解题

下图是上述示例问题的格局树:

注意,由于示例问题格局数非常少,我们可以给出完整的格局树。这种情况下我可以找到Minimax算法的全局最优解。而真实情况中,格局树非常庞大,即使是计算机也不可能给出完整的树,因此我们往往只搜索一定深度,这时只能找到局部最优解。

我们从甲的角度考虑。其中正方形节点表示轮到我方(甲),而三角形表示轮到对方(乙)。经过三轮对弈后(我方-对方-我方),将进入终局。黄色叶结点表示所有可能的结局。从甲方看,由于最终的收益可以通过纸币的面值评价,我们自然可以用结局中甲方拿到的纸币面值表示终格局的价值。

下面考虑倒数第二层节点,在这些节点上,轮到我方选择,所以我们应该引入可选择的最大价值格局,因此每个节点的价值为其子节点的最大值:

这些轮到我方的节点叫做max节点,max节点的值是其子节点最大值。

倒数第三层轮到对方选择,假设对方会尽力将局势引入让我方价值最小的格局,因此这些节点的价值取决于子节点的最小值。这些轮到对方的节点叫做min节点。

最后,根节点是max节点,因此价值取决于叶子节点的最大值。最终完整赋值的格局树如下:

总结一下Minimax算法的步骤:

  1. 首先确定最大搜索深度D,D可能达到终局,也可能是一个中间格局。
  2. 在最大深度为D的格局树叶子节点上,使用预定义的价值评价函数对叶子节点价值进行评价。
  3. 自底向上为非叶子节点赋值。其中max节点取子节点最大值,min节点取子节点最小值。
  4. 每次轮到我方时(此时必处在格局树的某个max节点),选择价值等于此max节点价值的那个子节点路径。

在上面的例子中,根节点的价值为20,表示如果对方每一步都完美决策,则我方按照上述算法可最终拿到20元,这是我方在Minimax算法下最好的决策。格局转换路径如下图红色路径所示:

对于真实问题中的Minimax,再次强调几点:

  • 真实问题一般无法构造出完整的格局树,所以需要确定一个最大深度D,每次最多从当前格局向下计算D层。
  • 因为上述原因,Minimax一般是寻找一个局部最优解而不是全局最优解,搜索深度越大越可能找到更好的解,但计算耗时会呈指数级膨胀。
  • 也是因为无法一次构造出完整的格局树,所以真实问题中Minimax一般是边对弈边计算局部格局树,而不是只计算一次,但已计算的中间结果可以缓存。

Alpha-beta剪枝

简单的Minimax算法有一个很大的问题就是计算复杂性。由于所需搜索的节点数随最大深度呈指数膨胀,而算法的效果往往和深度相关,因此这极大限制了算法的效果。

Alpha-beta剪枝是对Minimax的补充和改进。采用Alpha-beta剪枝后,我们可不必构造和搜索最大深度D内的所有节点,在构造过程中,如果发现当前格局再往下不能找到更好的解,我们就停止在这个格局及以下的搜索,也就是剪枝。

Alpha-beta基于这样一种朴素的思想:时时刻刻记得当前已经知道的最好选择,如果从当前格局搜索下去,不可能找到比已知最优解更好的解,则停止这个格局分支的搜索(剪枝),回溯到父节点继续搜索。

Alpha-beta算法可以看成变种的Minimax,基本方法是从根节点开始采用深度优先的方式构造格局树,在构造每个节点时,都会读取此节点的alpha和beta两个值,其中alpha表示搜索到当前节点时已知的最好选择的下界,而beta表示从这个节点往下搜索最坏结局的上界。由于我们假设对手会将局势引入最坏结局之一,因此当beta小于alpha时,表示从此处开始不论最终结局是哪一个,其上限价值也要低于已知的最优解,也就是说已经不可能此处向下找到更好的解,所以就会剪枝。

下面同样以上述示例介绍Alpha-beta剪枝算法的工作原理。我们从根节点开始,详述使用Alpha-beta的每一个步骤:

  1. 根节点的alpha和beta分别被初始化为,和+
  2. 深度优先搜索第一个孩子,不是叶子节点,所以alpha和beta继承自父节点,分别为,和+
  3. 搜索第三层的第一个孩子,同上。
  4. 搜索第四层,到达叶子节点,采用评价函数得到此节点的评价值为1。

  5. 此叶节点的父节点为max节点,因此更新其alpha值为1,表示此节点取值的下界为1。

  6. 再看另外一个子节点,值为20,大于当前alpha值,因此将alpha值更新为20。
  7. 此时第三层最左节点所有子树搜索完毕,作为max节点,更新其真实值为当前alpha值:20。
  8. 由于其父节点(第二层最左节点)为min节点,因此更新其父节点beta值为20,表示这个节点取值最多为20。

  9. 搜索第二层最左节点的第二个孩子及其子树,按上述逻辑,得到值为50(注意第二层最左节点的beta值要传递给孩子)。由于50大于20,不更新min节点的beta值。

  10. 搜索第二层最左节点的第三个孩子。当看完第一个叶子节点后,发现第三个孩子的alpha=beta,此时表示这个节点下不会再有更好解,于是剪枝。

  11. 继续搜索B分支,当搜索完B分支的第一个孩子后,发现此时B分支的alpha为20,beta为10。这表示B分支节点的最大取值不会超过10,而我们已经在A分支取到20,此时满足alpha大于等于beta的剪枝条件,因此将B剪枝。并将B分支的节点值设为10,注意,这个10不一定是这个节点的真实值,而只是上线,B节点的真实值可能是5,可能是1,可能是任何小于10的值。但是已经无所谓了,反正我们知道这个分支不会好过A分支,因此可以放弃了。

  12. 在C分支搜索时遇到了与B分支相同的情况。因此讲C分支剪枝。

此时搜索全部完毕,而我们也得到了这一步的策略:应该走A分支。

可以看到相比普通Minimax要搜索18个叶子节点相比,这里只搜索了9个。采用Alpha-beta剪枝,可以在相同时间内加大Minimax的搜索深度,因此可以获得更好的效果。并且Alpha-beta的解和普通Minimax的解是一致的。

针对2048游戏的实现

下面看一下ov3y同学针对2048实现的AI。程序的github在这里,主要程序都在ai.js中。

建模

上面说过Minimax和Alpha-beta都是针对信息对称的轮流对弈问题,这里作者是这样抽象游戏的:

  • 我方:游戏玩家。每次可以选择上、下、左、右四个行棋策略中的一种(某些格局会少于四种,因为有些方向不可走)。行棋后方块按照既定逻辑移动及合并,格局转换完成。
  • 对方:计算机。在当前任意空格子里放置一个方块,方块的数值可以是2或4。放置新方块后,格局转换完成。
  • 胜利条件:出现某个方块的数值为“2048”。
  • 失败条件:格子全满,且无法向四个方向中任何一个方向移动(均不能触发合并)。

如此2048游戏就被建模成一个信息对称的双人对弈问题。

格局评价

作为算法的核心,如何评价当前格局的价值是重中之重。在2048中,除了终局外,中间格局并无非常明显的价值评价指标,因此需要用一些启发式的指标来评价格局。那些分数高的“好”格局是容易引向胜利的格局,而分低的“坏”格局是容易引向失败的格局。

作者采用了如下几个启发式指标。

单调性

单调性指方块从左到右、从上到下均遵从递增或递减。一般来说,越单调的格局越好。下面是一个具有良好单调格局的例子:

平滑性

平滑性是指每个方块与其直接相邻方块数值的差,其中差越小越平滑。例如2旁边是4就比2旁边是128平滑。一般认为越平滑的格局越好。下面是一个具有极端平滑性的例子:

空格数

这个很好理解,因为一般来说,空格子越少对玩家越不利。所以我们认为空格越多的格局越好。

孤立空格数

这个指标评价空格被分开的程度,空格越分散则格局越差。

具体来说,2048-AI在评价格局时,对这些启发指标采用了加权策略。具体代码如下:

  1. // static evaluation function
  2. AI.prototype.eval = function() {
  3. var emptyCells = this.grid.availableCells().length;
  4.  
  5. var smoothWeight = 0.1,
  6. //monoWeight = 0.0,
  7. //islandWeight = 0.0,
  8. mono2Weight = 1.0,
  9. emptyWeight = 2.7,
  10. maxWeight = 1.0;
  11.  
  12. return this.grid.smoothness() * smoothWeight
  13. //+ this.grid.monotonicity() * monoWeight
  14. //- this.grid.islands() * islandWeight
  15. + this.grid.monotonicity2() * mono2Weight
  16. + Math.log(emptyCells) * emptyWeight
  17. + this.grid.maxValue() * maxWeight;
  18. };

有兴趣的同学可以调整一下权重看看有什么效果。

对对方选择的剪枝

在这个程序中,除了采用Alpha-beta剪枝外,在min节点还采用了另一种剪枝,即只考虑对方走出让格局最差的那一步(而实际2048中计算机的选择是随机的),而不是搜索全部对方可能的走法。这是因为对方所有可能的选择为“空格数×2”,如果全部搜索的话会严重限制搜索深度。

相关剪枝代码如下:

  1. // try a 2 and 4 in each cell and measure how annoying it is
  2. // with metrics from eval
  3. var candidates = [];
  4. var cells = this.grid.availableCells();
  5. var scores = { 2: [], 4: [] };
  6. for (var value in scores) {
  7. for (var i in cells) {
  8. scores[value].push(null);
  9. var cell = cells[i];
  10. var tile = new Tile(cell, parseInt(value, 10));
  11. this.grid.insertTile(tile);
  12. scores[value][i] = -this.grid.smoothness() + this.grid.islands();
  13. this.grid.removeTile(cell);
  14. }
  15. }
  16.  
  17. // now just pick out the most annoying moves
  18. var maxScore = Math.max(Math.max.apply(null, scores[2]), Math.max.apply(null, scores[4]));
  19. for (var value in scores) { // 2 and 4
  20. for (var i=0; i<scores[value].length; i++) {
  21. if (scores[value][i] == maxScore) {
  22. candidates.push( { position: cells[i], value: parseInt(value, 10) } );
  23. }
  24. }
  25. }

搜索深度

在2048-AI的实现中,并没有限制搜索的最大深度,而是限制每次“思考”的时间。这里设定了一个超时时间,默认为100ms,在这个时间内,会从1开始,搜索到所能达到的深度。相关代码:

  1. // performs iterative deepening over the alpha-beta search
  2. AI.prototype.iterativeDeep = function() {
  3. var start = (new Date()).getTime();
  4. var depth = 0;
  5. var best;
  6. do {
  7. var newBest = this.search(depth, -10000, 10000, 0 ,0);
  8. if (newBest.move == -1) {
  9. //console.log('BREAKING EARLY');
  10. break;
  11. } else {
  12. best = newBest;
  13. }
  14. depth++;
  15. } while ( (new Date()).getTime() - start < minSearchTime);
  16. //console.log('depth', --depth);
  17. //console.log(this.translate(best.move));
  18. //console.log(best);
  19. return best
  20. }

因此这个算法实现的效果实际上依赖于执行javascript引擎机器的性能。当然可以通过增加超时时间来达到更好的效果,但此时每一步行走速度会相应变慢。

算法的改进

目前这个实现作者声称成功合成2048的概率超过90%,但是合成4096甚至8192的概率并不高。作者在github项目的REAMDE中同时给出了一些优化建议,这些建议包括:

  • 缓存结果。目前这个实现并没有对已搜索的树做缓存,每一步都要重新开始搜索。
  • 多线程搜索。由于javascript引擎的单线程特性,这一点很难做到,但如果在其它平台上也许也可考虑并行技术。
  • 更好的启发函数。也许可以总结出一些更好的启发函数来评价格局价值。

参考文献

  1. 2048 Game
  2. 2048-AI github
  3. An Exhaustive Explanation of Minimax, a Staple AI Algorithm
  4. Tic Tac Toe: Understanding the Minimax Algorithm
  5. CS 161 Recitation Notes - Minimax with Alpha Beta Pruning
相关阅读:
stackoverflow关于它的讨论

I have recently stumbled upon the game
2048
. You merge similar tiles by moving them in any of the four directions to make "bigger" tiles. After each move, a new tile appears at random empty position with value of either
2 or 4. The game terminates when all the boxes are filled and there are no moves that can merge tiles, or you create a tile with a value of
2048.

One, I need to follow a well-defined strategy to reach the goal. So, I thought of writing a program for it.

My current algorithm:

while (!game_over) {
    for each possible move:
        count_no_of_merges_for_2-tiles and 4-tiles
    choose the move with large number of merges
}

What I am doing is at any point, I will try to merge the tiles with values
2
and 4, that is, I try to have 2 and 4 tiles, as minimum as possible. If I try it this way, all other tiles were automatically getting merged and the strategy seems good.

But, when I actually use this algorithm, I only get around 4000 points before the game terminates. Maximum points AFAIK is slightly more than 20,000 points which is way larger than my current score. Is there a better algorithm than the above?

share|improve
this question
 
43  
This might help!
ov3y.github.io/2048-AI
– 
cegprakash
Mar 12 at 6:12
12  
This is how they should have made the Windows 8 home screen 😉 -- Interesting game. The guy who wrote an AI solver claims that he uses alpha-beta searching with adaptive deepening. I wonder how you can apply
alpha-beta (or its un-optimized ancestor, minimax) to a non-adversarial game like this one.
– 

500 - Internal Server Error
Mar 12 at 22:54
12  
@500-InternalServerError: If
I were to implement an AI with alpha-beta game tree pruning, it would be assuming that the new blocks are adversarially placed. It's a worst-case assumption, but might be useful.
– 

Charles
Mar 14 at 20:52
7  
Suggest to repost here:
Programming Puzzles & Code Golf
– 
nobar
Mar 15 at 22:47
11  
now on
xkcd
– 
ratchet freak
Mar 19 at 11:27

show
20 more comments

11 Answers

I'm the author of the AI program that others have mentioned in this thread. You can view the AI in
action or read the
source
.

Currently, the program achieves about a 90% win rate running in javascript in the browser on my laptop given about 100 milliseconds of thinking time per move, so while not perfect (yet!) it performs pretty well.

Since the game is a discrete state space, perfect information, turn-based game like chess and checkers, I used the same methods that have been proven to work on those games, namely

minimax
search with
alpha-beta pruning
. Since there is already a lot of info on that algorithm out there, I'll just talk about the two main heuristics that I use in the
static evaluation function and which formalize many of the intuitions that other people have expressed here.

Monotonicity

This heuristic tries to ensure that the values of the tiles are all either increasing or decreasing along both the left/right and up/down directions. This heuristic alone captures the intuition that many others have mentioned, that higher valued tiles should
be clustered in a corner. It will typically prevent smaller valued tiles from getting orphaned and will keep the board very organized, with smaller tiles cascading in and filling up into the larger tiles.

Here's a screenshot of a perfectly monotonic grid. I obtained this by running the algorithm with the eval function set to disregard the other heuristics and only consider monotonicity.

A perfectly monotonic 2048 board

Smoothness

The above heuristic alone tends to create structures in which adjacent tiles are decreasing in value, but of course in order to merge, adjacent tiles need to be the same value. Therefore, the smoothness heuristic just measures the value difference between
neighboring tiles, trying to minimize this count.

A commenter on Hacker News gave
an interesting formalization
of this idea in terms of graph theory.

Here's a screenshot of a perfectly smooth grid, courtesy of
this excellent parody fork
.

A perfectly smooth 2048 board

Free Tiles

And finally, there is a penalty for having too few free tiles, since options can quickly run out when the game board gets too cramped.

And that's it! Searching through the game space while optimizing these criteria yields remarkably good performance. One advantage to using a generalized approach like this rather than an explicitly coded move strategy is that the algorithm can often find
interesting and unexpected solutions. If you watch it run, it will often make surprising but effective moves, like suddenly switching which wall or corner it's building up against.

Edit:

Here's a demonstration of the power of this approach. I uncapped the tile values (so it kept going after reaching 2048) and here is the best result after eight trials.

4096

Yes, that's a 4096 alongside a 2048. =) That means it achieved the elusive 2048 tile three times on the same board.

share|improve
this answer
 
67  
You can treat the computer placing the '2' and '4' tiles as the 'opponent'. – 

Wei Yen
Mar 15 at 2:53
19  
@WeiYen Sure, but regarding it as a minmax problem is not faithful to the game logic, because the computer is placing tiles randomly with certain probabilities, rather than intentionally minimising the score.
– 
koo
Mar 15 at 14:55
41  
Even though the AI is randomly placing the tiles, the goal is not to lose. Getting unlucky is the same thing as the opponent choosing the worst move for you. The "min" part means that you try to play conservatively
so that there are no awful moves that you could get unlucky.
– 
FryGuy
Mar 16 at 4:17
126  
I had an idea to create a fork of 2048, where the computer instead of placing the 2s and 4s randomly uses your AI to determine where to put the values. The result: sheer impossibleness. Can be tried out here:
sztupy.github.io/2048-Hard
– 

SztupY
Mar 17 at 1:03
20  
@SztupY Wow, this is evil. Reminds me of
qntm.org/hatetris Hatetris, which also tries to place the piece that will improve your situation the least.
– 

Patashu
Mar 17 at 2:27

show
25 more comments

I developed a 2048 AI using expectimax optimization, instead of the minimax search used by @ovolve's algorithm. The AI simply performs maximization over all possible moves, followed by expectation over all possible tile spawns (weighted by the probability
of the tiles, i.e. 10% for a 4 and 90% for a 2). As far as I'm aware, it is not possible to prune expectimax optimization (except to remove branches that are exceedingly unlikely), and so the algorithm used is a carefully optimized brute force search.

Performance

The AI in its default configuration (max search depth of 8) takes anywhere from 10ms to 200ms to execute a move, depending on the complexity of the board position. In testing, the AI achieves an average move rate of 6-10 moves per second over the course
of an entire game. If the search depth is limited to 6 moves, the AI can easily execute 20+ moves per second, which makes for some
interesting watching.

To assess the score performance of the AI, I ran the AI 100 times (connected to the browser game via remote control). For each tile, here are the proportions of games in which that tile was achieved at least once:

2048: 100%
4096: 97%
8192: 76%
16384: 13%

The minimum score over all runs was 27536; the maximum score achieved was 377792. The median score is 157652. The AI never failed to obtain the 2048 tile (so it never lost the game even once in 100 games).

Here's the screenshot of the best run:

16384 tile, score 377792

This game took 14395 moves over 37 minutes, or an average of 6.6 moves per second.

Implementation

My approach encodes the entire board (16 entries) as a single 64-bit integer (where tiles are the nybbles, i.e. 4-bit chunks). On a 64-bit machine, this enables the entire board to be passed around in a single machine register.

Bit shift operations are used to extract individual rows and columns. A single row or column is a 16-bit quantity, so a table of size 65536 can encode transformations which operate on a single row or column. For example, moves are implemented as 4 lookups
into a precomputed "move effect table" which describes how each move affects a single row or column (for example, the "move right" table contains the entry "1122 -> 0023" describing how the row [2,2,4,4] becomes the row [0,0,4,8] when moved to the right).

Scoring is also done using table lookup. The tables contain heuristic scores computed on all possible rows/columns, and the resultant score for a board is simply the sum of the table values across each row and column.

This board representation, along with the table lookup approach for movement and scoring, allows the AI to search a huge number of game states in a short period of time (over 10,000,000 game states per second on my 3-year-old laptop (single-core)).

The expectimax search itself is coded as a recursive search which alternates between "expectation" steps (testing all possible tile spawn locations and values, and weighting their optimized scores by the probability of each possibility), and "maximization"
steps (testing all possible moves and selecting the one with the best score). The tree search terminates when it sees a previously-seen position (using a
transposition table), when it reaches a predefined depth limit, or when it reaches a board state that is highly unlikely (e.g. it was reached by getting 6 "4" tiles in a row from
the starting position). The typical search depth is 4-8 moves.

The heuristic scoring algorithm is very straightforward: open squares count for +20000, and for each row and column, +20000 if its largest value is on the edge (as opposed to being in the two middle squares). The former heuristic performs extremely well
by itself (usually getting to 4096), but frequently dies after the maximum piece ends up in the middle 4 tiles. Thus, the latter heuristic guides the largest pieces towards the edges, where they are easier to combine.


That the AI achieves the 16384 tile is a huge milestone; I will be surprised to hear if any human players have achieved 16384 on the official game (i.e. without using tools like savestates or undo). Being able to achieve this result fully 13% of the time
makes me optimistic that the 32768 tile is achievable with some work!

You can try the AI for yourself. The code is available at
https://github.com/nneonneo/2048-ai
.

share|improve
this answer
 
9  
I achieved 16,384 with your AI machine - 289,848 points 🙂 – 

Asgrim
Mar 24 at 16:21
29  
+1 because the better AI machine is second and not the accepted answer! – 

dave
Mar 29 at 13:41
1  
@nneonneo: Whoops, totally missed it! I found it to be the same in my experience. I also did an expectimax AI for the iphone game "threes" (the one that spawned all these) and what worked best is an EV function
along the lines of (something for open square + sum of squares + 10*bottom-left + 9*bottom-second-left + 8*bottom-third-left). Too fun to watch these AIs play the game
– 

Claudiu
Apr 2 at 17:22
2  
@RobL: 2's appear 90% of the time; 4's appear 10% of the time. It's in the

source code
: var value = Math.random() < 0.9 ? 2 : 4;.
– 

nneonneo
Apr 4 at 5:22
7  
Currently porting to Cuda so the GPU does the work for even better speeds! – 

nimsson
Apr 11 at 21:54

show
12 more comments

I have refined the algorithm and beaten the game! It may fail due to simple bad luck close to the end (you are forced to move down, which you should never do, and a tile appears where your highest should be. Just try to keep the top row filled, so moving
left does not break the pattern), but basically you end up having a fixed part and a mobile part to play with. This is your objective:

Ready to finish

This is the model I chose by default.

1024 512 256 128
  8   16  32  64
  4   2   x   x
  x   x   x   x

The chosen corner is arbitrary, you basically never press one key (the forbidden move), and if you do, you press the contrary again and try to fix it. For future tiles the model always expects the next random tile to be a 2 and appear on the opposite side
to the current model (while the first row is incomplete, on the bottom right corner, once the first row is completed, on the bottom left corner).

Here goes the algorithm. Around 80% wins (it seems it is always possible to win with more "professional" AI techniques, I am not sure about this, though.)

initiateModel();

while(!game_over)
{    
    checkCornerChosen(); // Unimplemented, but it might be an improvement to change the reference point

    for each 3 possible move:
        evaluateResult()
    execute move with best score
    if no move is available, execute forbidden move and undo, recalculateModel()
 }

 evaluateResult() {
     calculatesBestCurrentModel()
     calculates distance to chosen model
     stores result
 }

 calculateBestCurrentModel() {
      (according to the current highest tile acheived and their distribution)
  }

A few pointers on the missing steps. Here: model change

The model has changed due to the luck of being closer to the expected model. The model the AI is trying to achieve is

 512 256 128  x
  X   X   x   x
  X   X   x   x
  x   x   x   x

And the chain to get there has become:

 512 256  64  O
  8   16  32  O
  4   x   x   x
  x   x   x   x

The O represent forbidden spaces...

So it will press right, then right again, then (right or top depending on where the 4 has created) then will proceed to complete the chain until it gets:

Chain completed

So now the model and chain are back to:

 512 256 128  64
  4   8  16   32
  X   X   x   x
  x   x   x   x

Second pointer, it has had bad luck and its main spot has been taken. It is likely that it will fail, but it can still achieve it:

Enter image description here

Here the model and chain is:

  O 1024 512 256
  O   O   O  128
  8  16   32  64
  4   x   x   x

When it manages to reach the 128 it gains a whole row is gained again:

  O 1024 512 256
  x   x  128 128
  x   x   x   x
  x   x   x   x
share|improve
this answer
 
 
execute move with best score how can you evaluate the best score out of the possible next states ? – 

Khaled A Khunaifer
Mar 12 at 17:05
 
the heuristic is defined in
evaluateResult
you basically try to get closest to the best possible scenario.
– 

Daren
Mar 12 at 17:12
 
@Daren I'm waiting for your detailed specifics – 

ashu
Mar 12 at 22:22
 
@ashu I'm working on it, unexpected circumstances have left me without time to finish it. Meanwhile I have improved the algorithm and it now solves it 75% of the time. – 

Daren
Mar 13 at 9:51
4  
What I really like about this strategy is that I am able to use it when playing the game manually, it got me up to 37k points. – 

Cephalopod
Apr 1 at 18:43

show
7 more comments

I copy here the content of a
post on my blog


The solution I propose is very simple and easy to implement. Although, it has reached the score of 131040. Several benchmarks of the algorithm performances are presented.

Score

Algorithm

Heuristic scoring algorithm

The assumption on which my algorithm is based is rather simple: if you want to achieve higher score, the board must be kept as tidy as possible. In particular, the optimal setup is given by a linear and monotonic decreasing order of the tile values. This
intuition will give you also the upper bound for a tile value: s where n is the number of tile on the board.

(There's a possibility to reach the 131072 tile if the 4-tile is randomly generated instead of the 2-tile when needed)

Two possible ways of organizing the board are shown in the following images:

enter image description here

To enforce the ordination of the tiles in a monotonic decreasing order, the score si computed as the sum of the linearized values on the board multiplied by the values of a geometric sequence with common ratio r<1 .

s

s

Several linear path could be evaluated at once, the final score will be the maximum score of any path.

Decision rule

The decision rule implemented is not quite smart, the code in Python is presented here:

@staticmethod
def nextMove(board,recursion_depth=3):
    m,s = AI.nextMoveRecur(board,recursion_depth,recursion_depth)
    return m

@staticmethod
def nextMoveRecur(board,depth,maxDepth,base=0.9):
    bestScore = -1.
    bestMove = 0
    for m in range(1,5):
        if(board.validMove(m)):
            newBoard = copy.deepcopy(board)
            newBoard.move(m,add_tile=True)

            score = AI.evaluate(newBoard)
            if depth != 0:
                my_m,my_s = AI.nextMoveRecur(newBoard,depth-1,maxDepth)
                score += my_s*pow(base,maxDepth-depth+1)

            if(score > bestScore):
                bestMove = m
                bestScore = score
    return (bestMove,bestScore);

An implementation of the minmax or the Expectiminimax will surely improve the algorithm. Obviously a more sophisticated decision rule will slow down the algorithm and it will require some time to be implemented.I will try a minimax implementation in the
near future. (stay tuned)

Benchmark

  • T1 - 121 tests - 8 different paths - r=0.125
  • T2 - 122 tests - 8-different paths - r=0.25
  • T3 - 132 tests - 8-different paths - r=0.5
  • T4 - 211 tests - 2-different paths - r=0.125
  • T5 - 274 tests - 2-different paths - r=0.25
  • T6 - 211 tests - 2-different paths - r=0.5

enter image description here
enter image description here
enter image description here
enter image description here

In case of T2, four tests in ten generate the 4096 tile with an average score of
s 42000

Code

The code can be found on GiHub at the following link:
https://github.com/Nicola17/term2048-AI
It is based on
term2048
and it's written in Python. I will implement a more efficient version in C++ as soon as possible.

share|improve
this answer
 
 
Not bad, your illustration has given me an idea, of taking the merge vectors into evaluation – 

Khaled A Khunaifer
Apr 8 at 8:45
 
@Nicola Can I contact you please? I need your help with something if you don't mind =) – 

Khaled Hassan
May 30 at 4:28
 
No problem, I've seen that you have added me on G+. You can send me a message over there. – 

Nicola Pezzotti
May 30 at 6:45

add comment

I became interested in the idea of an AI for this game containing no hard-coded intelligence (i.e no heuristics, scoring functions etc). The AI should
"know" only the game rules, and "figure out" the game play. This is in contrast to most AIs (like the ones in this thread) where the game play is essentially brute force steered by a scoring function representing human understanding of the
game.

AI Algorithm

I found a simple yet surprisingly good playing algorithm: To determine the move for a given board, the AI plays the game til the end using
random moves. This is done many times while keeping track of the end score. Then the average end score
per starting move is calculated. The move with the highest score is chosen.

Using just 100 runs per move the AI achieves the 2048 tile 80% of the times and the 4096 tile 50% of the times. Using 10000 runs gets the 2048 tile 100%, 70% for 4096 tile, and about 1% for the 8192 tile.

See it in action

The best achieved score is shown here:

best score

An interesting fact about this AI is that the random-play games are (unsurprisingly) quite bad, yet choosing the best (or least bad) move leads to very good game play: A typical AI game can reach 70000 points and last 3000 moves, yet the random-play runs
yield an average of 340 extra points and only 40 moves before dying. (You can see this for yourself by running the AI and opening the debug console.)

This graph illustrates this point: The blue line shows the board score after each move. The red line shows the AI's best end game score at that point. In essence, the red values are "pulling" the blue values upwards towards them, as they are the AI's best
guess. Note that the red line is just a tiny bit over the blue line at each point, yet the blue line continues to increase more and more.

scoring graph

I find this quite surprising, that the AI doesn't need to actually foresee good game play in order to chose the moves that produce it.

Searching later I found this algorithm might be classified as a
Pure Monte Carlo Tree Search
algorithm.

Implementation and Links

First I created a JavaScript version which can be
seen in action here
. This version can run 100's of runs in decent time. Open the console for extra info. (source)

Later, in order to play around some more I used @nneonneo highly optimized infrastructure and implemented my version in C++. This version allows for up to 100000 runs per move and even 1000000 if you have the patience. Building instructions provided. It
runs in the console and also has a remote-control to play the web version. (source)

Results

Surprisingly, increasing the number of runs does not drastically improve the game play. There seems to be a limit to this strategy at around 80000 points with the 4096 tile and all the smaller ones, very close to the achieving the 8192 tile. Increasing the
number of runs from 100 to 100000 increases the odds of getting to around this score limit (from 5% to 40%) but not breaking through it.

Running 10000 runs with a temporary increase to 1000000 near critical positions managed to break this barrier less than 1% of the times achieving a max score of 129892 and a 8192 tile.

Improvements

After implementing this algorithm I tried many improvements including using the min or max scores, or a combination of min,max,and avg. I also tried using depth: Instead of trying K runs per move, I tried K moves per move
list of a given length ("up,up,left" for example) and selecting the first move of the best scoring move list.

Later I implemented a scoring tree that took into account the conditional probability of being able to play a move after a given move list.

However, none of these ideas showed any real advantage over the simple first idea. I left the code for these ideas commented out in the C++ code.

I did add a "Deep Search" mechanism that increased the run number temporarily to 1000000 when any of the runs managed to accidentally reach the next highest tile. This offered a time improvement.

I'd be interested to hear if anyone has other improvement ideas that maintain the domain-independence of the AI.

2048 Variants and Clones

Just for fun, I've also
implemented the AI as a bookmarklet
, hooking into the game's controls. This allows the AI to work with the original game and
many of its variants.

This is possible due to domain-independent nature of the AI. Some of the variants are quite distinct, such as the Hexagonal clone.

share|improve
this answer
 
2  
+1. As an AI student I found this really interesting. Will take a better look at this in the free time. – 

Isaac
May 25 at 22:18
 
This is amazing! I just spent hours optimizing weights for a good heuristic function for expectimax and I implement this in 3 minutes and this completely smashes it. – 

Brendan Annable
May 29 at 17:09
 
Nice use of Monte Carlo simulation. – 

nneonneo
Jun 10 at 4:22
 
Bookmarklet ftw! Great for testing features. Thanks for the work! 🙂 – 

ikaruss
Jun 22 at 23:48

add comment

I think I found an algorithm which works quite well, as I often reach scores over 10000, my personal best being around 16000. My solution does not aim at keeping biggest numbers in a corner, but to keep it in the top row.

Please see the code below:

while( !game_over ) {
    move_direction=up;
    if( !move_is_possible(up) ) {
        if( move_is_possible(right) && move_is_possible(left) ){
            if( number_of_empty_cells_after_moves(left,up) > number_of_empty_cells_after_moves(right,up) ) 
                move_direction = left;
            else
                move_direction = right;
        } else if ( move_is_possible(left) ){
            move_direction = left;
        } else if ( move_is_possible(right) ){
            move_direction = right;
        } else {
            move_direction = down;
        }
    }
    do_move(move_direction);
}
share|improve
this answer
 
2  
I ran 100,000 games testing this versus the trivial cyclic strategy "up, right, up, left, ..." (and down if it must). The cyclic strategy finished an "average tile score" of
770.6, while this one got just 396.7. Do you have a guess why that might be? I'm thinking it does too many ups, even when left or right would merge a lot more.
– 

Thomas Ahle
Apr 6 at 13:49

add comment

Algorithm

while(!game_over)
{
    for each possible move:
        evaluate next state

    choose the maximum evaluation
}

Evaluation

Evaluation =
    128 (Constant)
    + (Number of Spaces x 128)
    + Sum of faces adjacent to a space { (1/face) x 4096 }
    + Sum of other faces { log(face) x 4 }
    + (Number of possible next moves x 256)
    + (Number of aligned values x 2)

Evaluation Details

128 (Constant)

This is a constant, used as a base-line and for other uses like testing.

+ (Number of Spaces x 128)

More spaces makes the state more flexible, we multiply by 128 (which is the median) since a grid filled with 128 faces is an optimal impossible state.

+ Sum of faces adjacent to a space { (1/face) x 4096 }

Here we evaluate faces that have the possibility to getting to merge, by evaluating them backwardly, tile 2 become of value 2048, while tile 2048 is evaluated 2.

+ Sum of other faces { log(face) x 4 }

In here we still need to check for stacked values, but in a lesser way that doesn't interrupt the flexibility parameters, so we have the sum of { x in [4,44] }.

+ (Number of possible next moves x 256)

A state is more flexible if it has more freedom of possible transitions.

+ (Number of aligned values x 2)

This is a simplified check of the possibility of having merges within that state, without making a look-ahead.

Note: The constants can be tweaked..

share|improve
this answer
 
1  
I will edit this later, to add a live code @nitish712 – 

Khaled A Khunaifer
Mar 12 at 20:16
3  
What is the win% of this algorithm? – 

cegprakash
Mar 15 at 6:17

add comment

There is already an AI implementation for this game:
here
. Excerpt from README:

The algorithm is iterative deepening depth first alpha-beta search. The evaluation function tries to keep the rows and columns monotonic (either all decreasing or increasing) while minimizing the number of tiles on the grid.

There is also a discussion on
ycombinator
about this algorithm that you may find useful.

share|improve
this answer
 
2  
This should be the top answer, but it would be nice to add more details about the implementation: e.g. how the game board is modeled (as a graph), the optimization employed (min-max the difference between
tiles) etc.
– 
Alceu Costa
Mar 13 at 19:44

add comment

I wrote a 2048 solver in Haskell, mainly because I'm learning this language right now.

My implementation of the game slightly differs from the actual game, in that a new tile is always a '2' (rather than 90% 2 and 10% 4). And that the new tile is not random, but always the first available one from the top left.

As a consequence, this solver is deterministic.

I used an exhaustive algorithm that favours empty tiles. It performs pretty quickly for depth 1-4, but on depth 5 it gets rather slow at a around 1 second per move.

Below is the code implementing the solving algorithm. The grid is represented as a 16-length array of Integers. And scoring is done simply by counting the number of empty squares.

bestMove :: Int -> [Int] -> Int
bestMove depth grid = maxTuple [ (gridValue depth (takeTurn x grid), x) | x <- [0..3], takeTurn x grid /= [] ]

gridValue :: Int -> [Int] -> Int
gridValue _ [] = -1
gridValue 0 grid = length $ filter (==0) grid  -- <= SCORING
gridValue depth grid = maxInList [ gridValue (depth-1) (takeTurn x grid) | x <- [0..3] ]

I thinks it's quite successful for its simplicity. The result it reaches when starting with an empty grid and solving at depth 5 is:

Move 4006
[2,64,16,4]
[16,4096,128,512]
[2048,64,1024,16]
[2,4,16,2]

Game Over

Source code can be found here:
https://github.com/popovitsj/2048-haskell

share|improve
this answer
 
 
Try to extend it with the actual rules. It's a good challenge in learning about Haskell's random generator! – 

Thomas Ahle
Apr 9 at 7:08
 
I got very frustrated with Haskell trying to do that, but I'm probably gonna give it a second try! I did find that the game gets considerably easier without the randomization. – 

popovitsj
Apr 9 at 8:19
 
Without randomization I'm pretty sure you could find a way to always get 16k or 32k. However randomization in Haskell is not that bad, you just need a way to pass around the `seed'. Either do it explicitly,
or with the Random monad.
– 
Thomas Ahle
Apr 9 at 9:06
 
Refining the algorithm so that it always reaches 16k/32k for a non-random game might be another interesting challenge... – 

popovitsj
Apr 9 at 10:17
 
You are right, it's harder than I thought. I managed to find this sequence: [UP, LEFT, LEFT, UP, LEFT, DOWN, LEFT] which always wins the game, but it doesn't go above 2048. (In case of no legal move, the
cycle algorithm just chooses the next one in clockwise order)
– 
Thomas Ahle
Apr 9 at 12:39

show
1 more comment

My strategy was simple:

  • Choose to make the highest tile in bottom-right corner
  • Never swipe the tiles up
  • Allways have 4 tiles on the bottom of the screen

My current game here:

share|improve
this answer
 
3  
Undo doesn't count. It makes it possible to negate the randomness of the game. – 

nneonneo
Jun 10 at 4:19

add comment

This algorithm is not optimal for winning the game, but it is fairly optimal in terms of performance and amount of code needed:

  if(can move neither right, up or down)
    direction = left
  else
  {
    do
    {
      direction = random from (right, down, up)
    }
    while(can not move in "direction")
  }
share|improve
this answer
 
11  
Why does this (in your opinion) work? – 

aoeu
Mar 18 at 19:04
3  
It's a joke.... – 
Felipe Tonello
Mar 19 at 18:39
5  
it works better if you say
random from (right, right, right, down, down, up)
so not all moves are of equal probability. 🙂
– 

Daren
Mar 20 at 15:48
2  
Yes, it is based on my own observation with the game. Until you have to use the 4th direction the game will practically solve itself without any kind of observation. This "AI" should be able to get to 512/1024
without checking the exact value of any block.
– 
API-Beast
Mar 20 at 18:44
2  
A proper AI would try to avoid getting to a state where it can only move into one direction at all cost. – 

API-Beast
Mar 20 at 18:47

show
5 more comments

发表回复