This isn't necessarily an issue, but rather an observation. It seems there are quite a few misspelt words in the example sentences.
Here is the output of a script which takes all examples, extracts their sentences, splits on spaces, lowercases all words, and uniqfies the words to form a simple vocabulary:
['1', 'more', '', 'isa', 'yelow', '.', 'has', 'numbers', 'block', 'having', 'have', 'no', 'base', 'leats', 'item,', 'back', 'other.', 'all', 'triangle,', 'odd', 'circles', 'circle', 'od', 'but', 'two', 'adge', 'black.', 'another', 'at', 'every', 'nealy', 'and', 'bases', 'touching', 'underneath', 't', 'sthe', 'box', 'same', 'one.', 'count', 'one', 'other', 'among', 'directly', 'objects', 'color.', 'do', 'wirh', 'bow', 'to', 'only.', '3', 'left', 'third', 'middle', 'than', 'that', 'less', 'closely', 'line', 'items,', 'shape.', 'out', 'boxes.', 'line.', 'bellow', 'ones.', 'triangle.', 'wwith', 'item', 'first', 'lot', 'nearly', 'ablue', 'triangle', 'coloured', 'not', 'both', 'box,', 'lease', 'box.', 'square,', 's', 'set', 'different.', 'over', 'colors.', 'alternately', 'them', 'just', 'in', 'between', 'smaller', '2', 'yelloe', 'objects,', 'squere', 'which', 'small', 'bottom-right', 'blue.', 'where', 'most', 'traingles.', 'ble', 'either', 'blicks', 'near', 'each', 'side', 'grey', ',', 'traingle', 'block.', 'bottom', 'beneath', '5', 'an', 'touhing', 'blue,', 'items.', 'bottom.', 'containing', 'positioned.', 'tocuhing', 'they', 'corner', 'height', 'base.', 'lest', 'square', 'three.', 'blocks,', 'consecutive', 'are', 'one,', 'total', 'yellow', 'different', 'towers', 'top.', 'item.', 'yelllow', 'block/', 'none', 'size.', 'bule', 'shapes', 'objects.', 'triangles.', 'only', 'number', 'some', 'size', 'least', 'ans', 'their', 'colour', 'objetcs', 'yellow.', 'ciircles', 'black,', 'any', 'second', 'corners.', 'middle.', 'contain', 'six', 'wih', 'medium', 'including', 'black', 'color', 'yellow,', 'circles.', 'attach', 'under', 'shape', 'all.', 'wth', 'height.', 'blacks', 'attached.', 'blocks', 'right', 'atleast', 'four', 'exacty', 'tow', 'theer', 'block,', 'each.', 'ones', 'bloxk', 'ia', 'bases.', 'big', 'exactly', 'items', 'then,idle', 'blccks', 'roof', 'squares.', 'eactly', 'this', 'attached', 'wall.', 'that.', 'colors', 'exacrly', 'is', 'corner.', 'ha', 'triangles', 'below', 'bo', 'opis', '6', 'edge', 'squares', 'thee', 'or', 'with', 'a', 'there', 'colored', 'exacts', 'towers.', 'boxes', 'hte', 'lower', 'positions', 'made', 'three', 'kinds', 'egde', 'top', 'almost', 'it.', 'four.', 'it', 'single', 'type', 'most.', 'cirlce', 'i', 'abox.', 'blocks.', 'same.', 'square.', 'blue', 'after', 'colours', 'bkack', 'together', 'colour.', 'ad', 'its', 'even', 'close', 'tleast', '4', 'the', 'five', 'seven', 'trianlge', 'circle,', 'position.', 'tower.', 'as', 'above', 'stacked', 'without', 'al', 'many', 'circ;e', 'tower', 'blocks..', 'side.', 'from', 'multiple', 'object', 'level.', 'stack', 'rectangle', 'b;ue', 'of', 'tower,', 'being', 'object.', 'circle.', 'on', 'sqaures', 'contains', 'wall', 'll']
I was just wondering whether there's any agreed upon convention for preprocessing the text?