Home » Personal collection » Acorn ADFS disks » Electron_User_Group » EUG_53.ADF » V/+COMPR2
V/+COMPR2
This website contains an archive of files for the Acorn Electron, BBC Micro, Acorn Archimedes, Commodore 16 and Commodore 64 computers, which Dominic Ford has rescued from his private collection of floppy disks and cassettes.
Some of these files were originally commercial releases in the 1980s and 1990s, but they are now widely available online. I assume that copyright over them is no longer being asserted. If you own the copyright and would like files to be removed, please contact me.
Tape/disk: | Home » Personal collection » Acorn ADFS disks » Electron_User_Group » EUG_53.ADF |
Filename: | V/+COMPR2 |
Read OK: | ✔ |
File size: | 1022 bytes |
Load address: | 2B204556 |
Exec address: | 504D4F43 |
Duplicates
There are 2 duplicate copies of this file in the archive:
- AEW website » eug » eug_3_5_discs_Eug-53_A-EUG53.adf » V/+COMPR2
- AEW website » eug » eug_5_25_discs_Eug-53_D-EUG53.dsd » V.+COMPR2
- Personal collection » Acorn ADFS disks » Electron_User_Group » EUG_53.ADF » V/+COMPR2
File contents
THE BIG SQUEEZE (2) In the last article, we discussed how text could be compressed by con- verting it into four-bit nibbles of commonly-used characters, or codes giving information about the following nibble: Nibble value Meaning ------------ ------- 0 Next nibble is a less commonly used character. 1 Next nibble is a token. 2 End of text. 3-15 The 13 most commonly used characters. Here we'll be discussing 'tokens'. A 'token' is a representation of a commonly used word, a bit like BASIC keywords, which are stored in memory as one-byte tokens. In our case, tokens occupy one nibble (four bits), meaning we can include 16 commonly-used words like "this", "and", "that", and so on. If "this" was token number zero, "and" token number one, and "that" token two, then a sentence like "this and that" would be compacted as so: byte 1 2 3 value &10 11 13 Program COMPRE2 is identical to COMPRES, but with the addition of the machine code to 'tokenise' any commonly-used words found in the input text, and expand them when the text is decompressed. The tokens are contained in the table 'toktab'. Each token starts with a number specifying the length of the token (e.g. 3 for "and") followed by the token word itself. The words are READ in from line 1990, and their length is worked out automatically by EQUB (LEN word$) in line 1210. Matching words in the input string to the token words is handled by 'chktok'. It starts at the beginning of 'toktab' and compares the current input character with the first character of the current word in 'toktab'. If it is the same, the next character from the input string is checked with the next character in the table and so on. The token length is put into the X register, and decremented each time a match is found. So if X=0, all characters have been matched, and the word in the input string has been successfully matched with the word in 'toktab'. If, on the other hand, the first letter was matched but not the next, the routine branches to 'notfnd'. The position of the next word in 'toktab' is calculated by adding the length of the current token, 'toklen', to the address in 'from' (which began as pointing to the start of the token table). The token number, 'tokno', is incremented by one so if a match is ever found, 'tokno' is written out to the the output string, preceeded by a one nibble which signifies that the next nibble is a token. &FF marks the end of the token table, so if the end of 'toktab' is reached with no match having been made, 'chktok' exits with the negative flag set. The 'compress' routine then gets on with individually matching common or less-common characters as before. If you want, you can devise your own set of token words and put them in line 1990. Note that there's no point in including two-letter words comprising common letters - "of", "is", and so forth - because a token flag followed by a token number takes up the same amount of memory as two commonly-used letters, i.e. two nibbles. The token expansion routine, 'outtok', works out the correct position in 'toktab' from which to begin printing. The lengths of any tokens preceeding the token in question are added to the address in 'from'. Again, the token length is loading into X and used as a counter for how many characters from the table should be sent to OSWRCH. To see what effect this has on the compression factor, type in "she sells sea shells on the sea shore", and compare the result with last time: 54% is quite impressive. Having said that, the entire compression program suffers from a limited token table, and only lower-case letters are accepted. You could improve on this by assuming that the character after a full stop and space is a capital letter. A few more spaces could be squeezed out by assuming that all punctuation marks are followed by a space. You also could assign a nibble value of 3 to mean that the following two nibbles index into a 256-word token table. Christopher Dewhurst, EUG #53
00000000 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | | 00000010 20 20 20 20 20 20 20 20 20 20 54 48 45 20 42 49 | THE BI| 00000020 47 20 53 51 55 45 45 5a 45 20 28 32 29 0d 0d 49 |G SQUEEZE (2)..I| 00000030 6e 20 74 68 65 20 6c 61 73 74 20 61 72 74 69 63 |n the last artic| 00000040 6c 65 2c 20 77 65 20 64 69 73 63 75 73 73 65 64 |le, we discussed| 00000050 20 68 6f 77 20 74 65 78 74 20 63 6f 75 6c 64 20 | how text could | 00000060 62 65 20 63 6f 6d 70 72 65 73 73 65 64 20 62 79 |be compressed by| 00000070 20 63 6f 6e 2d 0d 76 65 72 74 69 6e 67 20 69 74 | con-.verting it| 00000080 20 69 6e 74 6f 20 66 6f 75 72 2d 62 69 74 20 6e | into four-bit n| 00000090 69 62 62 6c 65 73 20 6f 66 20 63 6f 6d 6d 6f 6e |ibbles of common| 000000a0 6c 79 2d 75 73 65 64 20 63 68 61 72 61 63 74 65 |ly-used characte| 000000b0 72 73 2c 20 6f 72 20 63 6f 64 65 73 0d 67 69 76 |rs, or codes.giv| 000000c0 69 6e 67 20 69 6e 66 6f 72 6d 61 74 69 6f 6e 20 |ing information | 000000d0 61 62 6f 75 74 20 74 68 65 20 66 6f 6c 6c 6f 77 |about the follow| 000000e0 69 6e 67 20 6e 69 62 62 6c 65 3a 0d 0d 4e 69 62 |ing nibble:..Nib| 000000f0 62 6c 65 20 76 61 6c 75 65 20 20 20 4d 65 61 6e |ble value Mean| 00000100 69 6e 67 0d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |ing.------------| 00000110 20 20 20 2d 2d 2d 2d 2d 2d 2d 0d 30 20 20 20 20 | -------.0 | 00000120 20 20 20 20 20 20 20 20 20 20 4e 65 78 74 20 6e | Next n| 00000130 69 62 62 6c 65 20 69 73 20 61 20 6c 65 73 73 20 |ibble is a less | 00000140 63 6f 6d 6d 6f 6e 6c 79 20 75 73 65 64 20 63 68 |commonly used ch| 00000150 61 72 61 63 74 65 72 2e 0d 31 20 20 20 20 20 20 |aracter..1 | 00000160 20 20 20 20 20 20 20 20 4e 65 78 74 20 6e 69 62 | Next nib| 00000170 62 6c 65 20 69 73 20 61 20 74 6f 6b 65 6e 2e 0d |ble is a token..| 00000180 32 20 20 20 20 20 20 20 20 20 20 20 20 20 20 45 |2 E| 00000190 6e 64 20 6f 66 20 74 65 78 74 2e 0d 33 2d 31 35 |nd of text..3-15| 000001a0 20 20 20 20 20 20 20 20 20 20 20 54 68 65 20 31 | The 1| 000001b0 33 20 6d 6f 73 74 20 63 6f 6d 6d 6f 6e 6c 79 20 |3 most commonly | 000001c0 75 73 65 64 20 63 68 61 72 61 63 74 65 72 73 2e |used characters.| 000001d0 0d 0d 48 65 72 65 20 77 65 27 6c 6c 20 62 65 20 |..Here we'll be | 000001e0 64 69 73 63 75 73 73 69 6e 67 20 27 74 6f 6b 65 |discussing 'toke| 000001f0 6e 73 27 2e 20 41 20 27 74 6f 6b 65 6e 27 20 69 |ns'. A 'token' i| 00000200 73 20 61 20 72 65 70 72 65 73 65 6e 74 61 74 69 |s a representati| 00000210 6f 6e 20 6f 66 20 61 0d 63 6f 6d 6d 6f 6e 6c 79 |on of a.commonly| 00000220 20 75 73 65 64 20 77 6f 72 64 2c 20 61 20 62 69 | used word, a bi| 00000230 74 20 6c 69 6b 65 20 42 41 53 49 43 20 6b 65 79 |t like BASIC key| 00000240 77 6f 72 64 73 2c 20 77 68 69 63 68 20 61 72 65 |words, which are| 00000250 20 73 74 6f 72 65 64 20 69 6e 0d 6d 65 6d 6f 72 | stored in.memor| 00000260 79 20 61 73 20 6f 6e 65 2d 62 79 74 65 20 74 6f |y as one-byte to| 00000270 6b 65 6e 73 2e 0d 20 20 20 49 6e 20 6f 75 72 20 |kens.. In our | 00000280 63 61 73 65 2c 20 74 6f 6b 65 6e 73 20 6f 63 63 |case, tokens occ| 00000290 75 70 79 20 6f 6e 65 20 6e 69 62 62 6c 65 20 28 |upy one nibble (| 000002a0 66 6f 75 72 20 62 69 74 73 29 2c 20 6d 65 61 6e |four bits), mean| 000002b0 69 6e 67 20 77 65 20 63 61 6e 0d 69 6e 63 6c 75 |ing we can.inclu| 000002c0 64 65 20 31 36 20 63 6f 6d 6d 6f 6e 6c 79 2d 75 |de 16 commonly-u| 000002d0 73 65 64 20 77 6f 72 64 73 20 6c 69 6b 65 20 22 |sed words like "| 000002e0 74 68 69 73 22 2c 20 22 61 6e 64 22 2c 20 22 74 |this", "and", "t| 000002f0 68 61 74 22 2c 20 61 6e 64 20 73 6f 20 6f 6e 2e |hat", and so on.| 00000300 20 49 66 0d 22 74 68 69 73 22 20 77 61 73 20 74 | If."this" was t| 00000310 6f 6b 65 6e 20 6e 75 6d 62 65 72 20 7a 65 72 6f |oken number zero| 00000320 2c 20 22 61 6e 64 22 20 74 6f 6b 65 6e 20 6e 75 |, "and" token nu| 00000330 6d 62 65 72 20 6f 6e 65 2c 20 61 6e 64 20 22 74 |mber one, and "t| 00000340 68 61 74 22 20 74 6f 6b 65 6e 0d 74 77 6f 2c 20 |hat" token.two, | 00000350 74 68 65 6e 20 61 20 73 65 6e 74 65 6e 63 65 20 |then a sentence | 00000360 6c 69 6b 65 20 22 74 68 69 73 20 61 6e 64 20 74 |like "this and t| 00000370 68 61 74 22 20 77 6f 75 6c 64 20 62 65 20 63 6f |hat" would be co| 00000380 6d 70 61 63 74 65 64 20 61 73 20 73 6f 3a 0d 20 |mpacted as so:. | 00000390 20 20 20 20 20 62 79 74 65 20 20 20 20 31 20 20 | byte 1 | 000003a0 32 20 20 33 0d 20 20 20 20 20 20 76 61 6c 75 65 |2 3. value| 000003b0 20 26 31 30 20 31 31 20 31 33 0d 0d 50 72 6f 67 | &10 11 13..Prog| 000003c0 72 61 6d 20 43 4f 4d 50 52 45 32 20 69 73 20 69 |ram COMPRE2 is i| 000003d0 64 65 6e 74 69 63 61 6c 20 74 6f 20 43 4f 4d 50 |dentical to COMP| 000003e0 52 45 53 2c 20 62 75 74 20 77 69 74 68 20 74 68 |RES, but with th| 000003f0 65 20 61 64 64 69 74 69 6f 6e 20 6f 66 20 74 68 |e addition of th| 00000400 65 0d 6d 61 63 68 69 6e 65 20 63 6f 64 65 20 74 |e.machine code t| 00000410 6f 20 27 74 6f 6b 65 6e 69 73 65 27 20 61 6e 79 |o 'tokenise' any| 00000420 20 63 6f 6d 6d 6f 6e 6c 79 2d 75 73 65 64 20 77 | commonly-used w| 00000430 6f 72 64 73 20 66 6f 75 6e 64 20 69 6e 20 74 68 |ords found in th| 00000440 65 20 69 6e 70 75 74 0d 74 65 78 74 2c 20 61 6e |e input.text, an| 00000450 64 20 65 78 70 61 6e 64 20 74 68 65 6d 20 77 68 |d expand them wh| 00000460 65 6e 20 74 68 65 20 74 65 78 74 20 69 73 20 64 |en the text is d| 00000470 65 63 6f 6d 70 72 65 73 73 65 64 2e 0d 20 20 20 |ecompressed.. | 00000480 54 68 65 20 74 6f 6b 65 6e 73 20 61 72 65 20 63 |The tokens are c| 00000490 6f 6e 74 61 69 6e 65 64 20 69 6e 20 74 68 65 20 |ontained in the | 000004a0 74 61 62 6c 65 20 27 74 6f 6b 74 61 62 27 2e 20 |table 'toktab'. | 000004b0 45 61 63 68 20 74 6f 6b 65 6e 20 73 74 61 72 74 |Each token start| 000004c0 73 0d 77 69 74 68 20 61 20 6e 75 6d 62 65 72 20 |s.with a number | 000004d0 73 70 65 63 69 66 79 69 6e 67 20 74 68 65 20 6c |specifying the l| 000004e0 65 6e 67 74 68 20 6f 66 20 74 68 65 20 74 6f 6b |ength of the tok| 000004f0 65 6e 20 28 65 2e 67 2e 20 33 20 66 6f 72 20 22 |en (e.g. 3 for "| 00000500 61 6e 64 22 29 0d 66 6f 6c 6c 6f 77 65 64 20 62 |and").followed b| 00000510 79 20 74 68 65 20 74 6f 6b 65 6e 20 77 6f 72 64 |y the token word| 00000520 20 69 74 73 65 6c 66 2e 20 54 68 65 20 77 6f 72 | itself. The wor| 00000530 64 73 20 61 72 65 20 52 45 41 44 20 69 6e 20 66 |ds are READ in f| 00000540 72 6f 6d 20 6c 69 6e 65 0d 31 39 39 30 2c 20 61 |rom line.1990, a| 00000550 6e 64 20 74 68 65 69 72 20 6c 65 6e 67 74 68 20 |nd their length | 00000560 69 73 20 77 6f 72 6b 65 64 20 6f 75 74 20 61 75 |is worked out au| 00000570 74 6f 6d 61 74 69 63 61 6c 6c 79 20 62 79 20 45 |tomatically by E| 00000580 51 55 42 20 28 4c 45 4e 20 77 6f 72 64 24 29 0d |QUB (LEN word$).| 00000590 69 6e 20 6c 69 6e 65 20 31 32 31 30 2e 0d 20 20 |in line 1210.. | 000005a0 20 4d 61 74 63 68 69 6e 67 20 77 6f 72 64 73 20 | Matching words | 000005b0 69 6e 20 74 68 65 20 69 6e 70 75 74 20 73 74 72 |in the input str| 000005c0 69 6e 67 20 74 6f 20 74 68 65 20 74 6f 6b 65 6e |ing to the token| 000005d0 20 77 6f 72 64 73 20 69 73 20 68 61 6e 64 6c 65 | words is handle| 000005e0 64 20 62 79 0d 27 63 68 6b 74 6f 6b 27 2e 20 49 |d by.'chktok'. I| 000005f0 74 20 73 74 61 72 74 73 20 61 74 20 74 68 65 20 |t starts at the | 00000600 62 65 67 69 6e 6e 69 6e 67 20 6f 66 20 27 74 6f |beginning of 'to| 00000610 6b 74 61 62 27 20 61 6e 64 20 63 6f 6d 70 61 72 |ktab' and compar| 00000620 65 73 20 74 68 65 0d 63 75 72 72 65 6e 74 20 69 |es the.current i| 00000630 6e 70 75 74 20 63 68 61 72 61 63 74 65 72 20 77 |nput character w| 00000640 69 74 68 20 74 68 65 20 66 69 72 73 74 20 63 68 |ith the first ch| 00000650 61 72 61 63 74 65 72 20 6f 66 20 74 68 65 20 63 |aracter of the c| 00000660 75 72 72 65 6e 74 20 77 6f 72 64 20 69 6e 0d 27 |urrent word in.'| 00000670 74 6f 6b 74 61 62 27 2e 20 49 66 20 69 74 20 69 |toktab'. If it i| 00000680 73 20 74 68 65 20 73 61 6d 65 2c 20 74 68 65 20 |s the same, the | 00000690 6e 65 78 74 20 63 68 61 72 61 63 74 65 72 20 66 |next character f| 000006a0 72 6f 6d 20 74 68 65 20 69 6e 70 75 74 20 73 74 |rom the input st| 000006b0 72 69 6e 67 0d 69 73 20 63 68 65 63 6b 65 64 20 |ring.is checked | 000006c0 77 69 74 68 20 74 68 65 20 6e 65 78 74 20 63 68 |with the next ch| 000006d0 61 72 61 63 74 65 72 20 69 6e 20 74 68 65 20 74 |aracter in the t| 000006e0 61 62 6c 65 20 61 6e 64 20 73 6f 20 6f 6e 2e 20 |able and so on. | 000006f0 54 68 65 20 74 6f 6b 65 6e 0d 6c 65 6e 67 74 68 |The token.length| 00000700 20 69 73 20 70 75 74 20 69 6e 74 6f 20 74 68 65 | is put into the| 00000710 20 58 20 72 65 67 69 73 74 65 72 2c 20 61 6e 64 | X register, and| 00000720 20 64 65 63 72 65 6d 65 6e 74 65 64 20 65 61 63 | decremented eac| 00000730 68 20 74 69 6d 65 20 61 20 6d 61 74 63 68 20 69 |h time a match i| 00000740 73 0d 66 6f 75 6e 64 2e 20 53 6f 20 69 66 20 58 |s.found. So if X| 00000750 3d 30 2c 20 61 6c 6c 20 63 68 61 72 61 63 74 65 |=0, all characte| 00000760 72 73 20 68 61 76 65 20 62 65 65 6e 20 6d 61 74 |rs have been mat| 00000770 63 68 65 64 2c 20 61 6e 64 20 74 68 65 20 77 6f |ched, and the wo| 00000780 72 64 20 69 6e 20 74 68 65 0d 69 6e 70 75 74 20 |rd in the.input | 00000790 73 74 72 69 6e 67 20 68 61 73 20 62 65 65 6e 20 |string has been | 000007a0 73 75 63 63 65 73 73 66 75 6c 6c 79 20 6d 61 74 |successfully mat| 000007b0 63 68 65 64 20 77 69 74 68 20 74 68 65 20 77 6f |ched with the wo| 000007c0 72 64 20 69 6e 20 27 74 6f 6b 74 61 62 27 2e 0d |rd in 'toktab'..| 000007d0 20 20 20 49 66 2c 20 6f 6e 20 74 68 65 20 6f 74 | If, on the ot| 000007e0 68 65 72 20 68 61 6e 64 2c 20 74 68 65 20 66 69 |her hand, the fi| 000007f0 72 73 74 20 6c 65 74 74 65 72 20 77 61 73 20 6d |rst letter was m| 00000800 61 74 63 68 65 64 20 62 75 74 20 6e 6f 74 20 74 |atched but not t| 00000810 68 65 0d 6e 65 78 74 2c 20 74 68 65 20 72 6f 75 |he.next, the rou| 00000820 74 69 6e 65 20 62 72 61 6e 63 68 65 73 20 74 6f |tine branches to| 00000830 20 27 6e 6f 74 66 6e 64 27 2e 20 54 68 65 20 70 | 'notfnd'. The p| 00000840 6f 73 69 74 69 6f 6e 20 6f 66 20 74 68 65 20 6e |osition of the n| 00000850 65 78 74 20 77 6f 72 64 20 69 6e 0d 27 74 6f 6b |ext word in.'tok| 00000860 74 61 62 27 20 69 73 20 63 61 6c 63 75 6c 61 74 |tab' is calculat| 00000870 65 64 20 62 79 20 61 64 64 69 6e 67 20 74 68 65 |ed by adding the| 00000880 20 6c 65 6e 67 74 68 20 6f 66 20 74 68 65 20 63 | length of the c| 00000890 75 72 72 65 6e 74 20 74 6f 6b 65 6e 2c 0d 27 74 |urrent token,.'t| 000008a0 6f 6b 6c 65 6e 27 2c 20 74 6f 20 74 68 65 20 61 |oklen', to the a| 000008b0 64 64 72 65 73 73 20 69 6e 20 27 66 72 6f 6d 27 |ddress in 'from'| 000008c0 20 28 77 68 69 63 68 20 62 65 67 61 6e 20 61 73 | (which began as| 000008d0 20 70 6f 69 6e 74 69 6e 67 20 74 6f 20 74 68 65 | pointing to the| 000008e0 20 73 74 61 72 74 0d 6f 66 20 74 68 65 20 74 6f | start.of the to| 000008f0 6b 65 6e 20 74 61 62 6c 65 29 2e 20 54 68 65 20 |ken table). The | 00000900 74 6f 6b 65 6e 20 6e 75 6d 62 65 72 2c 20 27 74 |token number, 't| 00000910 6f 6b 6e 6f 27 2c 20 69 73 20 69 6e 63 72 65 6d |okno', is increm| 00000920 65 6e 74 65 64 20 62 79 20 6f 6e 65 20 73 6f 0d |ented by one so.| 00000930 69 66 20 61 20 6d 61 74 63 68 20 69 73 20 65 76 |if a match is ev| 00000940 65 72 20 66 6f 75 6e 64 2c 20 27 74 6f 6b 6e 6f |er found, 'tokno| 00000950 27 20 69 73 20 77 72 69 74 74 65 6e 20 6f 75 74 |' is written out| 00000960 20 74 6f 20 74 68 65 20 74 68 65 20 6f 75 74 70 | to the the outp| 00000970 75 74 0d 73 74 72 69 6e 67 2c 20 70 72 65 63 65 |ut.string, prece| 00000980 65 64 65 64 20 62 79 20 61 20 6f 6e 65 20 6e 69 |eded by a one ni| 00000990 62 62 6c 65 20 77 68 69 63 68 20 73 69 67 6e 69 |bble which signi| 000009a0 66 69 65 73 20 74 68 61 74 20 74 68 65 20 6e 65 |fies that the ne| 000009b0 78 74 20 6e 69 62 62 6c 65 0d 69 73 20 61 20 74 |xt nibble.is a t| 000009c0 6f 6b 65 6e 2e 0d 20 20 20 26 46 46 20 6d 61 72 |oken.. &FF mar| 000009d0 6b 73 20 74 68 65 20 65 6e 64 20 6f 66 20 74 68 |ks the end of th| 000009e0 65 20 74 6f 6b 65 6e 20 74 61 62 6c 65 2c 20 73 |e token table, s| 000009f0 6f 20 69 66 20 74 68 65 20 65 6e 64 20 6f 66 20 |o if the end of | 00000a00 27 74 6f 6b 74 61 62 27 20 69 73 0d 72 65 61 63 |'toktab' is.reac| 00000a10 68 65 64 20 77 69 74 68 20 6e 6f 20 6d 61 74 63 |hed with no matc| 00000a20 68 20 68 61 76 69 6e 67 20 62 65 65 6e 20 6d 61 |h having been ma| 00000a30 64 65 2c 20 27 63 68 6b 74 6f 6b 27 20 65 78 69 |de, 'chktok' exi| 00000a40 74 73 20 77 69 74 68 20 74 68 65 20 6e 65 67 61 |ts with the nega| 00000a50 74 69 76 65 0d 66 6c 61 67 20 73 65 74 2e 20 54 |tive.flag set. T| 00000a60 68 65 20 27 63 6f 6d 70 72 65 73 73 27 20 72 6f |he 'compress' ro| 00000a70 75 74 69 6e 65 20 74 68 65 6e 20 67 65 74 73 20 |utine then gets | 00000a80 6f 6e 20 77 69 74 68 20 69 6e 64 69 76 69 64 75 |on with individu| 00000a90 61 6c 6c 79 20 6d 61 74 63 68 69 6e 67 0d 63 6f |ally matching.co| 00000aa0 6d 6d 6f 6e 20 6f 72 20 6c 65 73 73 2d 63 6f 6d |mmon or less-com| 00000ab0 6d 6f 6e 20 63 68 61 72 61 63 74 65 72 73 20 61 |mon characters a| 00000ac0 73 20 62 65 66 6f 72 65 2e 0d 20 20 20 49 66 20 |s before.. If | 00000ad0 79 6f 75 20 77 61 6e 74 2c 20 79 6f 75 20 63 61 |you want, you ca| 00000ae0 6e 20 64 65 76 69 73 65 20 79 6f 75 72 20 6f 77 |n devise your ow| 00000af0 6e 20 73 65 74 20 6f 66 20 74 6f 6b 65 6e 20 77 |n set of token w| 00000b00 6f 72 64 73 20 61 6e 64 20 70 75 74 20 74 68 65 |ords and put the| 00000b10 6d 0d 69 6e 20 6c 69 6e 65 20 31 39 39 30 2e 20 |m.in line 1990. | 00000b20 4e 6f 74 65 20 74 68 61 74 20 74 68 65 72 65 27 |Note that there'| 00000b30 73 20 6e 6f 20 70 6f 69 6e 74 20 69 6e 20 69 6e |s no point in in| 00000b40 63 6c 75 64 69 6e 67 20 74 77 6f 2d 6c 65 74 74 |cluding two-lett| 00000b50 65 72 20 77 6f 72 64 73 0d 63 6f 6d 70 72 69 73 |er words.compris| 00000b60 69 6e 67 20 63 6f 6d 6d 6f 6e 20 6c 65 74 74 65 |ing common lette| 00000b70 72 73 20 2d 20 22 6f 66 22 2c 20 22 69 73 22 2c |rs - "of", "is",| 00000b80 20 61 6e 64 20 73 6f 20 66 6f 72 74 68 20 2d 20 | and so forth - | 00000b90 62 65 63 61 75 73 65 20 61 20 74 6f 6b 65 6e 20 |because a token | 00000ba0 0d 66 6c 61 67 20 66 6f 6c 6c 6f 77 65 64 20 62 |.flag followed b| 00000bb0 79 20 61 20 74 6f 6b 65 6e 20 6e 75 6d 62 65 72 |y a token number| 00000bc0 20 74 61 6b 65 73 20 75 70 20 74 68 65 20 73 61 | takes up the sa| 00000bd0 6d 65 20 61 6d 6f 75 6e 74 20 6f 66 20 6d 65 6d |me amount of mem| 00000be0 6f 72 79 20 61 73 0d 74 77 6f 20 63 6f 6d 6d 6f |ory as.two commo| 00000bf0 6e 6c 79 2d 75 73 65 64 20 6c 65 74 74 65 72 73 |nly-used letters| 00000c00 2c 20 69 2e 65 2e 20 74 77 6f 20 6e 69 62 62 6c |, i.e. two nibbl| 00000c10 65 73 2e 0d 20 20 20 54 68 65 20 74 6f 6b 65 6e |es.. The token| 00000c20 20 65 78 70 61 6e 73 69 6f 6e 20 72 6f 75 74 69 | expansion routi| 00000c30 6e 65 2c 20 27 6f 75 74 74 6f 6b 27 2c 20 77 6f |ne, 'outtok', wo| 00000c40 72 6b 73 20 6f 75 74 20 74 68 65 20 63 6f 72 72 |rks out the corr| 00000c50 65 63 74 20 70 6f 73 69 74 69 6f 6e 0d 69 6e 20 |ect position.in | 00000c60 27 74 6f 6b 74 61 62 27 20 66 72 6f 6d 20 77 68 |'toktab' from wh| 00000c70 69 63 68 20 74 6f 20 62 65 67 69 6e 20 70 72 69 |ich to begin pri| 00000c80 6e 74 69 6e 67 2e 20 54 68 65 20 6c 65 6e 67 74 |nting. The lengt| 00000c90 68 73 20 6f 66 20 61 6e 79 20 74 6f 6b 65 6e 73 |hs of any tokens| 00000ca0 20 0d 70 72 65 63 65 65 64 69 6e 67 20 74 68 65 | .preceeding the| 00000cb0 20 74 6f 6b 65 6e 20 69 6e 20 71 75 65 73 74 69 | token in questi| 00000cc0 6f 6e 20 61 72 65 20 61 64 64 65 64 20 74 6f 20 |on are added to | 00000cd0 74 68 65 20 61 64 64 72 65 73 73 20 69 6e 20 27 |the address in '| 00000ce0 66 72 6f 6d 27 2e 20 0d 41 67 61 69 6e 2c 20 74 |from'. .Again, t| 00000cf0 68 65 20 74 6f 6b 65 6e 20 6c 65 6e 67 74 68 20 |he token length | 00000d00 69 73 20 6c 6f 61 64 69 6e 67 20 69 6e 74 6f 20 |is loading into | 00000d10 58 20 61 6e 64 20 75 73 65 64 20 61 73 20 61 20 |X and used as a | 00000d20 63 6f 75 6e 74 65 72 20 66 6f 72 20 68 6f 77 0d |counter for how.| 00000d30 6d 61 6e 79 20 63 68 61 72 61 63 74 65 72 73 20 |many characters | 00000d40 66 72 6f 6d 20 74 68 65 20 74 61 62 6c 65 20 73 |from the table s| 00000d50 68 6f 75 6c 64 20 62 65 20 73 65 6e 74 20 74 6f |hould be sent to| 00000d60 20 4f 53 57 52 43 48 2e 0d 20 20 20 54 6f 20 73 | OSWRCH.. To s| 00000d70 65 65 20 77 68 61 74 20 65 66 66 65 63 74 20 74 |ee what effect t| 00000d80 68 69 73 20 68 61 73 20 6f 6e 20 74 68 65 20 63 |his has on the c| 00000d90 6f 6d 70 72 65 73 73 69 6f 6e 20 66 61 63 74 6f |ompression facto| 00000da0 72 2c 20 74 79 70 65 20 69 6e 20 22 73 68 65 0d |r, type in "she.| 00000db0 73 65 6c 6c 73 20 73 65 61 20 73 68 65 6c 6c 73 |sells sea shells| 00000dc0 20 6f 6e 20 74 68 65 20 73 65 61 20 73 68 6f 72 | on the sea shor| 00000dd0 65 22 2c 20 61 6e 64 20 63 6f 6d 70 61 72 65 20 |e", and compare | 00000de0 74 68 65 20 72 65 73 75 6c 74 20 77 69 74 68 20 |the result with | 00000df0 6c 61 73 74 0d 74 69 6d 65 3a 20 35 34 25 20 69 |last.time: 54% i| 00000e00 73 20 71 75 69 74 65 20 69 6d 70 72 65 73 73 69 |s quite impressi| 00000e10 76 65 2e 20 48 61 76 69 6e 67 20 73 61 69 64 20 |ve. Having said | 00000e20 74 68 61 74 2c 20 74 68 65 20 65 6e 74 69 72 65 |that, the entire| 00000e30 20 63 6f 6d 70 72 65 73 73 69 6f 6e 0d 70 72 6f | compression.pro| 00000e40 67 72 61 6d 20 73 75 66 66 65 72 73 20 66 72 6f |gram suffers fro| 00000e50 6d 20 61 20 6c 69 6d 69 74 65 64 20 74 6f 6b 65 |m a limited toke| 00000e60 6e 20 74 61 62 6c 65 2c 20 61 6e 64 20 6f 6e 6c |n table, and onl| 00000e70 79 20 6c 6f 77 65 72 2d 63 61 73 65 20 6c 65 74 |y lower-case let| 00000e80 74 65 72 73 0d 61 72 65 20 61 63 63 65 70 74 65 |ters.are accepte| 00000e90 64 2e 20 59 6f 75 20 63 6f 75 6c 64 20 69 6d 70 |d. You could imp| 00000ea0 72 6f 76 65 20 6f 6e 20 74 68 69 73 20 62 79 20 |rove on this by | 00000eb0 61 73 73 75 6d 69 6e 67 20 74 68 61 74 20 74 68 |assuming that th| 00000ec0 65 20 63 68 61 72 61 63 74 65 72 0d 61 66 74 65 |e character.afte| 00000ed0 72 20 61 20 66 75 6c 6c 20 73 74 6f 70 20 61 6e |r a full stop an| 00000ee0 64 20 73 70 61 63 65 20 69 73 20 61 20 63 61 70 |d space is a cap| 00000ef0 69 74 61 6c 20 6c 65 74 74 65 72 2e 20 41 20 66 |ital letter. A f| 00000f00 65 77 20 6d 6f 72 65 20 73 70 61 63 65 73 20 63 |ew more spaces c| 00000f10 6f 75 6c 64 0d 62 65 20 73 71 75 65 65 7a 65 64 |ould.be squeezed| 00000f20 20 6f 75 74 20 62 79 20 61 73 73 75 6d 69 6e 67 | out by assuming| 00000f30 20 74 68 61 74 20 61 6c 6c 20 70 75 6e 63 74 75 | that all punctu| 00000f40 61 74 69 6f 6e 20 6d 61 72 6b 73 20 61 72 65 20 |ation marks are | 00000f50 66 6f 6c 6c 6f 77 65 64 20 62 79 20 61 0d 73 70 |followed by a.sp| 00000f60 61 63 65 2e 20 59 6f 75 20 61 6c 73 6f 20 63 6f |ace. You also co| 00000f70 75 6c 64 20 61 73 73 69 67 6e 20 61 20 6e 69 62 |uld assign a nib| 00000f80 62 6c 65 20 76 61 6c 75 65 20 6f 66 20 33 20 74 |ble value of 3 t| 00000f90 6f 20 6d 65 61 6e 20 74 68 61 74 20 74 68 65 0d |o mean that the.| 00000fa0 66 6f 6c 6c 6f 77 69 6e 67 20 74 77 6f 20 6e 69 |following two ni| 00000fb0 62 62 6c 65 73 20 69 6e 64 65 78 20 69 6e 74 6f |bbles index into| 00000fc0 20 61 20 32 35 36 2d 77 6f 72 64 20 74 6f 6b 65 | a 256-word toke| 00000fd0 6e 20 74 61 62 6c 65 2e 0d 20 20 20 20 20 20 20 |n table.. | 00000fe0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | | * 00001000 20 20 20 20 43 68 72 69 73 74 6f 70 68 65 72 20 | Christopher | 00001010 44 65 77 68 75 72 73 74 2c 20 45 55 47 20 23 35 |Dewhurst, EUG #5| 00001020 33 0d |3.| 00001022