Home » Personal collection » Acorn ADFS disks » Electron_User_Group » EUG_53.ADF » V/+COMPR2

V/+COMPR2

This website contains an archive of files for the Acorn Electron, BBC Micro, Acorn Archimedes, Commodore 16 and Commodore 64 computers, which Dominic Ford has rescued from his private collection of floppy disks and cassettes.

Some of these files were originally commercial releases in the 1980s and 1990s, but they are now widely available online. I assume that copyright over them is no longer being asserted. If you own the copyright and would like files to be removed, please contact me.

Tape/disk: Home » Personal collection » Acorn ADFS disks » Electron_User_Group » EUG_53.ADF
Filename: V/+COMPR2
Read OK:
File size: 1022 bytes
Load address: 2B204556
Exec address: 504D4F43
Duplicates

There are 2 duplicate copies of this file in the archive:

File contents
                          THE BIG SQUEEZE (2)

In the last article, we discussed how text could be compressed by con-
verting it into four-bit nibbles of commonly-used characters, or codes
giving information about the following nibble:

Nibble value   Meaning
------------   -------
0              Next nibble is a less commonly used character.
1              Next nibble is a token.
2              End of text.
3-15           The 13 most commonly used characters.

Here we'll be discussing 'tokens'. A 'token' is a representation of a
commonly used word, a bit like BASIC keywords, which are stored in
memory as one-byte tokens.
   In our case, tokens occupy one nibble (four bits), meaning we can
include 16 commonly-used words like "this", "and", "that", and so on. If
"this" was token number zero, "and" token number one, and "that" token
two, then a sentence like "this and that" would be compacted as so:
      byte    1  2  3
      value &10 11 13

Program COMPRE2 is identical to COMPRES, but with the addition of the
machine code to 'tokenise' any commonly-used words found in the input
text, and expand them when the text is decompressed.
   The tokens are contained in the table 'toktab'. Each token starts
with a number specifying the length of the token (e.g. 3 for "and")
followed by the token word itself. The words are READ in from line
1990, and their length is worked out automatically by EQUB (LEN word$)
in line 1210.
   Matching words in the input string to the token words is handled by
'chktok'. It starts at the beginning of 'toktab' and compares the
current input character with the first character of the current word in
'toktab'. If it is the same, the next character from the input string
is checked with the next character in the table and so on. The token
length is put into the X register, and decremented each time a match is
found. So if X=0, all characters have been matched, and the word in the
input string has been successfully matched with the word in 'toktab'.
   If, on the other hand, the first letter was matched but not the
next, the routine branches to 'notfnd'. The position of the next word in
'toktab' is calculated by adding the length of the current token,
'toklen', to the address in 'from' (which began as pointing to the start
of the token table). The token number, 'tokno', is incremented by one so
if a match is ever found, 'tokno' is written out to the the output
string, preceeded by a one nibble which signifies that the next nibble
is a token.
   &FF marks the end of the token table, so if the end of 'toktab' is
reached with no match having been made, 'chktok' exits with the negative
flag set. The 'compress' routine then gets on with individually matching
common or less-common characters as before.
   If you want, you can devise your own set of token words and put them
in line 1990. Note that there's no point in including two-letter words
comprising common letters - "of", "is", and so forth - because a token 
flag followed by a token number takes up the same amount of memory as
two commonly-used letters, i.e. two nibbles.
   The token expansion routine, 'outtok', works out the correct position
in 'toktab' from which to begin printing. The lengths of any tokens 
preceeding the token in question are added to the address in 'from'. 
Again, the token length is loading into X and used as a counter for how
many characters from the table should be sent to OSWRCH.
   To see what effect this has on the compression factor, type in "she
sells sea shells on the sea shore", and compare the result with last
time: 54% is quite impressive. Having said that, the entire compression
program suffers from a limited token table, and only lower-case letters
are accepted. You could improve on this by assuming that the character
after a full stop and space is a capital letter. A few more spaces could
be squeezed out by assuming that all punctuation marks are followed by a
space. You also could assign a nibble value of 3 to mean that the
following two nibbles index into a 256-word token table.
                                           Christopher Dewhurst, EUG #53
00000000  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000010  20 20 20 20 20 20 20 20  20 20 54 48 45 20 42 49  |          THE BI|
00000020  47 20 53 51 55 45 45 5a  45 20 28 32 29 0d 0d 49  |G SQUEEZE (2)..I|
00000030  6e 20 74 68 65 20 6c 61  73 74 20 61 72 74 69 63  |n the last artic|
00000040  6c 65 2c 20 77 65 20 64  69 73 63 75 73 73 65 64  |le, we discussed|
00000050  20 68 6f 77 20 74 65 78  74 20 63 6f 75 6c 64 20  | how text could |
00000060  62 65 20 63 6f 6d 70 72  65 73 73 65 64 20 62 79  |be compressed by|
00000070  20 63 6f 6e 2d 0d 76 65  72 74 69 6e 67 20 69 74  | con-.verting it|
00000080  20 69 6e 74 6f 20 66 6f  75 72 2d 62 69 74 20 6e  | into four-bit n|
00000090  69 62 62 6c 65 73 20 6f  66 20 63 6f 6d 6d 6f 6e  |ibbles of common|
000000a0  6c 79 2d 75 73 65 64 20  63 68 61 72 61 63 74 65  |ly-used characte|
000000b0  72 73 2c 20 6f 72 20 63  6f 64 65 73 0d 67 69 76  |rs, or codes.giv|
000000c0  69 6e 67 20 69 6e 66 6f  72 6d 61 74 69 6f 6e 20  |ing information |
000000d0  61 62 6f 75 74 20 74 68  65 20 66 6f 6c 6c 6f 77  |about the follow|
000000e0  69 6e 67 20 6e 69 62 62  6c 65 3a 0d 0d 4e 69 62  |ing nibble:..Nib|
000000f0  62 6c 65 20 76 61 6c 75  65 20 20 20 4d 65 61 6e  |ble value   Mean|
00000100  69 6e 67 0d 2d 2d 2d 2d  2d 2d 2d 2d 2d 2d 2d 2d  |ing.------------|
00000110  20 20 20 2d 2d 2d 2d 2d  2d 2d 0d 30 20 20 20 20  |   -------.0    |
00000120  20 20 20 20 20 20 20 20  20 20 4e 65 78 74 20 6e  |          Next n|
00000130  69 62 62 6c 65 20 69 73  20 61 20 6c 65 73 73 20  |ibble is a less |
00000140  63 6f 6d 6d 6f 6e 6c 79  20 75 73 65 64 20 63 68  |commonly used ch|
00000150  61 72 61 63 74 65 72 2e  0d 31 20 20 20 20 20 20  |aracter..1      |
00000160  20 20 20 20 20 20 20 20  4e 65 78 74 20 6e 69 62  |        Next nib|
00000170  62 6c 65 20 69 73 20 61  20 74 6f 6b 65 6e 2e 0d  |ble is a token..|
00000180  32 20 20 20 20 20 20 20  20 20 20 20 20 20 20 45  |2              E|
00000190  6e 64 20 6f 66 20 74 65  78 74 2e 0d 33 2d 31 35  |nd of text..3-15|
000001a0  20 20 20 20 20 20 20 20  20 20 20 54 68 65 20 31  |           The 1|
000001b0  33 20 6d 6f 73 74 20 63  6f 6d 6d 6f 6e 6c 79 20  |3 most commonly |
000001c0  75 73 65 64 20 63 68 61  72 61 63 74 65 72 73 2e  |used characters.|
000001d0  0d 0d 48 65 72 65 20 77  65 27 6c 6c 20 62 65 20  |..Here we'll be |
000001e0  64 69 73 63 75 73 73 69  6e 67 20 27 74 6f 6b 65  |discussing 'toke|
000001f0  6e 73 27 2e 20 41 20 27  74 6f 6b 65 6e 27 20 69  |ns'. A 'token' i|
00000200  73 20 61 20 72 65 70 72  65 73 65 6e 74 61 74 69  |s a representati|
00000210  6f 6e 20 6f 66 20 61 0d  63 6f 6d 6d 6f 6e 6c 79  |on of a.commonly|
00000220  20 75 73 65 64 20 77 6f  72 64 2c 20 61 20 62 69  | used word, a bi|
00000230  74 20 6c 69 6b 65 20 42  41 53 49 43 20 6b 65 79  |t like BASIC key|
00000240  77 6f 72 64 73 2c 20 77  68 69 63 68 20 61 72 65  |words, which are|
00000250  20 73 74 6f 72 65 64 20  69 6e 0d 6d 65 6d 6f 72  | stored in.memor|
00000260  79 20 61 73 20 6f 6e 65  2d 62 79 74 65 20 74 6f  |y as one-byte to|
00000270  6b 65 6e 73 2e 0d 20 20  20 49 6e 20 6f 75 72 20  |kens..   In our |
00000280  63 61 73 65 2c 20 74 6f  6b 65 6e 73 20 6f 63 63  |case, tokens occ|
00000290  75 70 79 20 6f 6e 65 20  6e 69 62 62 6c 65 20 28  |upy one nibble (|
000002a0  66 6f 75 72 20 62 69 74  73 29 2c 20 6d 65 61 6e  |four bits), mean|
000002b0  69 6e 67 20 77 65 20 63  61 6e 0d 69 6e 63 6c 75  |ing we can.inclu|
000002c0  64 65 20 31 36 20 63 6f  6d 6d 6f 6e 6c 79 2d 75  |de 16 commonly-u|
000002d0  73 65 64 20 77 6f 72 64  73 20 6c 69 6b 65 20 22  |sed words like "|
000002e0  74 68 69 73 22 2c 20 22  61 6e 64 22 2c 20 22 74  |this", "and", "t|
000002f0  68 61 74 22 2c 20 61 6e  64 20 73 6f 20 6f 6e 2e  |hat", and so on.|
00000300  20 49 66 0d 22 74 68 69  73 22 20 77 61 73 20 74  | If."this" was t|
00000310  6f 6b 65 6e 20 6e 75 6d  62 65 72 20 7a 65 72 6f  |oken number zero|
00000320  2c 20 22 61 6e 64 22 20  74 6f 6b 65 6e 20 6e 75  |, "and" token nu|
00000330  6d 62 65 72 20 6f 6e 65  2c 20 61 6e 64 20 22 74  |mber one, and "t|
00000340  68 61 74 22 20 74 6f 6b  65 6e 0d 74 77 6f 2c 20  |hat" token.two, |
00000350  74 68 65 6e 20 61 20 73  65 6e 74 65 6e 63 65 20  |then a sentence |
00000360  6c 69 6b 65 20 22 74 68  69 73 20 61 6e 64 20 74  |like "this and t|
00000370  68 61 74 22 20 77 6f 75  6c 64 20 62 65 20 63 6f  |hat" would be co|
00000380  6d 70 61 63 74 65 64 20  61 73 20 73 6f 3a 0d 20  |mpacted as so:. |
00000390  20 20 20 20 20 62 79 74  65 20 20 20 20 31 20 20  |     byte    1  |
000003a0  32 20 20 33 0d 20 20 20  20 20 20 76 61 6c 75 65  |2  3.      value|
000003b0  20 26 31 30 20 31 31 20  31 33 0d 0d 50 72 6f 67  | &10 11 13..Prog|
000003c0  72 61 6d 20 43 4f 4d 50  52 45 32 20 69 73 20 69  |ram COMPRE2 is i|
000003d0  64 65 6e 74 69 63 61 6c  20 74 6f 20 43 4f 4d 50  |dentical to COMP|
000003e0  52 45 53 2c 20 62 75 74  20 77 69 74 68 20 74 68  |RES, but with th|
000003f0  65 20 61 64 64 69 74 69  6f 6e 20 6f 66 20 74 68  |e addition of th|
00000400  65 0d 6d 61 63 68 69 6e  65 20 63 6f 64 65 20 74  |e.machine code t|
00000410  6f 20 27 74 6f 6b 65 6e  69 73 65 27 20 61 6e 79  |o 'tokenise' any|
00000420  20 63 6f 6d 6d 6f 6e 6c  79 2d 75 73 65 64 20 77  | commonly-used w|
00000430  6f 72 64 73 20 66 6f 75  6e 64 20 69 6e 20 74 68  |ords found in th|
00000440  65 20 69 6e 70 75 74 0d  74 65 78 74 2c 20 61 6e  |e input.text, an|
00000450  64 20 65 78 70 61 6e 64  20 74 68 65 6d 20 77 68  |d expand them wh|
00000460  65 6e 20 74 68 65 20 74  65 78 74 20 69 73 20 64  |en the text is d|
00000470  65 63 6f 6d 70 72 65 73  73 65 64 2e 0d 20 20 20  |ecompressed..   |
00000480  54 68 65 20 74 6f 6b 65  6e 73 20 61 72 65 20 63  |The tokens are c|
00000490  6f 6e 74 61 69 6e 65 64  20 69 6e 20 74 68 65 20  |ontained in the |
000004a0  74 61 62 6c 65 20 27 74  6f 6b 74 61 62 27 2e 20  |table 'toktab'. |
000004b0  45 61 63 68 20 74 6f 6b  65 6e 20 73 74 61 72 74  |Each token start|
000004c0  73 0d 77 69 74 68 20 61  20 6e 75 6d 62 65 72 20  |s.with a number |
000004d0  73 70 65 63 69 66 79 69  6e 67 20 74 68 65 20 6c  |specifying the l|
000004e0  65 6e 67 74 68 20 6f 66  20 74 68 65 20 74 6f 6b  |ength of the tok|
000004f0  65 6e 20 28 65 2e 67 2e  20 33 20 66 6f 72 20 22  |en (e.g. 3 for "|
00000500  61 6e 64 22 29 0d 66 6f  6c 6c 6f 77 65 64 20 62  |and").followed b|
00000510  79 20 74 68 65 20 74 6f  6b 65 6e 20 77 6f 72 64  |y the token word|
00000520  20 69 74 73 65 6c 66 2e  20 54 68 65 20 77 6f 72  | itself. The wor|
00000530  64 73 20 61 72 65 20 52  45 41 44 20 69 6e 20 66  |ds are READ in f|
00000540  72 6f 6d 20 6c 69 6e 65  0d 31 39 39 30 2c 20 61  |rom line.1990, a|
00000550  6e 64 20 74 68 65 69 72  20 6c 65 6e 67 74 68 20  |nd their length |
00000560  69 73 20 77 6f 72 6b 65  64 20 6f 75 74 20 61 75  |is worked out au|
00000570  74 6f 6d 61 74 69 63 61  6c 6c 79 20 62 79 20 45  |tomatically by E|
00000580  51 55 42 20 28 4c 45 4e  20 77 6f 72 64 24 29 0d  |QUB (LEN word$).|
00000590  69 6e 20 6c 69 6e 65 20  31 32 31 30 2e 0d 20 20  |in line 1210..  |
000005a0  20 4d 61 74 63 68 69 6e  67 20 77 6f 72 64 73 20  | Matching words |
000005b0  69 6e 20 74 68 65 20 69  6e 70 75 74 20 73 74 72  |in the input str|
000005c0  69 6e 67 20 74 6f 20 74  68 65 20 74 6f 6b 65 6e  |ing to the token|
000005d0  20 77 6f 72 64 73 20 69  73 20 68 61 6e 64 6c 65  | words is handle|
000005e0  64 20 62 79 0d 27 63 68  6b 74 6f 6b 27 2e 20 49  |d by.'chktok'. I|
000005f0  74 20 73 74 61 72 74 73  20 61 74 20 74 68 65 20  |t starts at the |
00000600  62 65 67 69 6e 6e 69 6e  67 20 6f 66 20 27 74 6f  |beginning of 'to|
00000610  6b 74 61 62 27 20 61 6e  64 20 63 6f 6d 70 61 72  |ktab' and compar|
00000620  65 73 20 74 68 65 0d 63  75 72 72 65 6e 74 20 69  |es the.current i|
00000630  6e 70 75 74 20 63 68 61  72 61 63 74 65 72 20 77  |nput character w|
00000640  69 74 68 20 74 68 65 20  66 69 72 73 74 20 63 68  |ith the first ch|
00000650  61 72 61 63 74 65 72 20  6f 66 20 74 68 65 20 63  |aracter of the c|
00000660  75 72 72 65 6e 74 20 77  6f 72 64 20 69 6e 0d 27  |urrent word in.'|
00000670  74 6f 6b 74 61 62 27 2e  20 49 66 20 69 74 20 69  |toktab'. If it i|
00000680  73 20 74 68 65 20 73 61  6d 65 2c 20 74 68 65 20  |s the same, the |
00000690  6e 65 78 74 20 63 68 61  72 61 63 74 65 72 20 66  |next character f|
000006a0  72 6f 6d 20 74 68 65 20  69 6e 70 75 74 20 73 74  |rom the input st|
000006b0  72 69 6e 67 0d 69 73 20  63 68 65 63 6b 65 64 20  |ring.is checked |
000006c0  77 69 74 68 20 74 68 65  20 6e 65 78 74 20 63 68  |with the next ch|
000006d0  61 72 61 63 74 65 72 20  69 6e 20 74 68 65 20 74  |aracter in the t|
000006e0  61 62 6c 65 20 61 6e 64  20 73 6f 20 6f 6e 2e 20  |able and so on. |
000006f0  54 68 65 20 74 6f 6b 65  6e 0d 6c 65 6e 67 74 68  |The token.length|
00000700  20 69 73 20 70 75 74 20  69 6e 74 6f 20 74 68 65  | is put into the|
00000710  20 58 20 72 65 67 69 73  74 65 72 2c 20 61 6e 64  | X register, and|
00000720  20 64 65 63 72 65 6d 65  6e 74 65 64 20 65 61 63  | decremented eac|
00000730  68 20 74 69 6d 65 20 61  20 6d 61 74 63 68 20 69  |h time a match i|
00000740  73 0d 66 6f 75 6e 64 2e  20 53 6f 20 69 66 20 58  |s.found. So if X|
00000750  3d 30 2c 20 61 6c 6c 20  63 68 61 72 61 63 74 65  |=0, all characte|
00000760  72 73 20 68 61 76 65 20  62 65 65 6e 20 6d 61 74  |rs have been mat|
00000770  63 68 65 64 2c 20 61 6e  64 20 74 68 65 20 77 6f  |ched, and the wo|
00000780  72 64 20 69 6e 20 74 68  65 0d 69 6e 70 75 74 20  |rd in the.input |
00000790  73 74 72 69 6e 67 20 68  61 73 20 62 65 65 6e 20  |string has been |
000007a0  73 75 63 63 65 73 73 66  75 6c 6c 79 20 6d 61 74  |successfully mat|
000007b0  63 68 65 64 20 77 69 74  68 20 74 68 65 20 77 6f  |ched with the wo|
000007c0  72 64 20 69 6e 20 27 74  6f 6b 74 61 62 27 2e 0d  |rd in 'toktab'..|
000007d0  20 20 20 49 66 2c 20 6f  6e 20 74 68 65 20 6f 74  |   If, on the ot|
000007e0  68 65 72 20 68 61 6e 64  2c 20 74 68 65 20 66 69  |her hand, the fi|
000007f0  72 73 74 20 6c 65 74 74  65 72 20 77 61 73 20 6d  |rst letter was m|
00000800  61 74 63 68 65 64 20 62  75 74 20 6e 6f 74 20 74  |atched but not t|
00000810  68 65 0d 6e 65 78 74 2c  20 74 68 65 20 72 6f 75  |he.next, the rou|
00000820  74 69 6e 65 20 62 72 61  6e 63 68 65 73 20 74 6f  |tine branches to|
00000830  20 27 6e 6f 74 66 6e 64  27 2e 20 54 68 65 20 70  | 'notfnd'. The p|
00000840  6f 73 69 74 69 6f 6e 20  6f 66 20 74 68 65 20 6e  |osition of the n|
00000850  65 78 74 20 77 6f 72 64  20 69 6e 0d 27 74 6f 6b  |ext word in.'tok|
00000860  74 61 62 27 20 69 73 20  63 61 6c 63 75 6c 61 74  |tab' is calculat|
00000870  65 64 20 62 79 20 61 64  64 69 6e 67 20 74 68 65  |ed by adding the|
00000880  20 6c 65 6e 67 74 68 20  6f 66 20 74 68 65 20 63  | length of the c|
00000890  75 72 72 65 6e 74 20 74  6f 6b 65 6e 2c 0d 27 74  |urrent token,.'t|
000008a0  6f 6b 6c 65 6e 27 2c 20  74 6f 20 74 68 65 20 61  |oklen', to the a|
000008b0  64 64 72 65 73 73 20 69  6e 20 27 66 72 6f 6d 27  |ddress in 'from'|
000008c0  20 28 77 68 69 63 68 20  62 65 67 61 6e 20 61 73  | (which began as|
000008d0  20 70 6f 69 6e 74 69 6e  67 20 74 6f 20 74 68 65  | pointing to the|
000008e0  20 73 74 61 72 74 0d 6f  66 20 74 68 65 20 74 6f  | start.of the to|
000008f0  6b 65 6e 20 74 61 62 6c  65 29 2e 20 54 68 65 20  |ken table). The |
00000900  74 6f 6b 65 6e 20 6e 75  6d 62 65 72 2c 20 27 74  |token number, 't|
00000910  6f 6b 6e 6f 27 2c 20 69  73 20 69 6e 63 72 65 6d  |okno', is increm|
00000920  65 6e 74 65 64 20 62 79  20 6f 6e 65 20 73 6f 0d  |ented by one so.|
00000930  69 66 20 61 20 6d 61 74  63 68 20 69 73 20 65 76  |if a match is ev|
00000940  65 72 20 66 6f 75 6e 64  2c 20 27 74 6f 6b 6e 6f  |er found, 'tokno|
00000950  27 20 69 73 20 77 72 69  74 74 65 6e 20 6f 75 74  |' is written out|
00000960  20 74 6f 20 74 68 65 20  74 68 65 20 6f 75 74 70  | to the the outp|
00000970  75 74 0d 73 74 72 69 6e  67 2c 20 70 72 65 63 65  |ut.string, prece|
00000980  65 64 65 64 20 62 79 20  61 20 6f 6e 65 20 6e 69  |eded by a one ni|
00000990  62 62 6c 65 20 77 68 69  63 68 20 73 69 67 6e 69  |bble which signi|
000009a0  66 69 65 73 20 74 68 61  74 20 74 68 65 20 6e 65  |fies that the ne|
000009b0  78 74 20 6e 69 62 62 6c  65 0d 69 73 20 61 20 74  |xt nibble.is a t|
000009c0  6f 6b 65 6e 2e 0d 20 20  20 26 46 46 20 6d 61 72  |oken..   &FF mar|
000009d0  6b 73 20 74 68 65 20 65  6e 64 20 6f 66 20 74 68  |ks the end of th|
000009e0  65 20 74 6f 6b 65 6e 20  74 61 62 6c 65 2c 20 73  |e token table, s|
000009f0  6f 20 69 66 20 74 68 65  20 65 6e 64 20 6f 66 20  |o if the end of |
00000a00  27 74 6f 6b 74 61 62 27  20 69 73 0d 72 65 61 63  |'toktab' is.reac|
00000a10  68 65 64 20 77 69 74 68  20 6e 6f 20 6d 61 74 63  |hed with no matc|
00000a20  68 20 68 61 76 69 6e 67  20 62 65 65 6e 20 6d 61  |h having been ma|
00000a30  64 65 2c 20 27 63 68 6b  74 6f 6b 27 20 65 78 69  |de, 'chktok' exi|
00000a40  74 73 20 77 69 74 68 20  74 68 65 20 6e 65 67 61  |ts with the nega|
00000a50  74 69 76 65 0d 66 6c 61  67 20 73 65 74 2e 20 54  |tive.flag set. T|
00000a60  68 65 20 27 63 6f 6d 70  72 65 73 73 27 20 72 6f  |he 'compress' ro|
00000a70  75 74 69 6e 65 20 74 68  65 6e 20 67 65 74 73 20  |utine then gets |
00000a80  6f 6e 20 77 69 74 68 20  69 6e 64 69 76 69 64 75  |on with individu|
00000a90  61 6c 6c 79 20 6d 61 74  63 68 69 6e 67 0d 63 6f  |ally matching.co|
00000aa0  6d 6d 6f 6e 20 6f 72 20  6c 65 73 73 2d 63 6f 6d  |mmon or less-com|
00000ab0  6d 6f 6e 20 63 68 61 72  61 63 74 65 72 73 20 61  |mon characters a|
00000ac0  73 20 62 65 66 6f 72 65  2e 0d 20 20 20 49 66 20  |s before..   If |
00000ad0  79 6f 75 20 77 61 6e 74  2c 20 79 6f 75 20 63 61  |you want, you ca|
00000ae0  6e 20 64 65 76 69 73 65  20 79 6f 75 72 20 6f 77  |n devise your ow|
00000af0  6e 20 73 65 74 20 6f 66  20 74 6f 6b 65 6e 20 77  |n set of token w|
00000b00  6f 72 64 73 20 61 6e 64  20 70 75 74 20 74 68 65  |ords and put the|
00000b10  6d 0d 69 6e 20 6c 69 6e  65 20 31 39 39 30 2e 20  |m.in line 1990. |
00000b20  4e 6f 74 65 20 74 68 61  74 20 74 68 65 72 65 27  |Note that there'|
00000b30  73 20 6e 6f 20 70 6f 69  6e 74 20 69 6e 20 69 6e  |s no point in in|
00000b40  63 6c 75 64 69 6e 67 20  74 77 6f 2d 6c 65 74 74  |cluding two-lett|
00000b50  65 72 20 77 6f 72 64 73  0d 63 6f 6d 70 72 69 73  |er words.compris|
00000b60  69 6e 67 20 63 6f 6d 6d  6f 6e 20 6c 65 74 74 65  |ing common lette|
00000b70  72 73 20 2d 20 22 6f 66  22 2c 20 22 69 73 22 2c  |rs - "of", "is",|
00000b80  20 61 6e 64 20 73 6f 20  66 6f 72 74 68 20 2d 20  | and so forth - |
00000b90  62 65 63 61 75 73 65 20  61 20 74 6f 6b 65 6e 20  |because a token |
00000ba0  0d 66 6c 61 67 20 66 6f  6c 6c 6f 77 65 64 20 62  |.flag followed b|
00000bb0  79 20 61 20 74 6f 6b 65  6e 20 6e 75 6d 62 65 72  |y a token number|
00000bc0  20 74 61 6b 65 73 20 75  70 20 74 68 65 20 73 61  | takes up the sa|
00000bd0  6d 65 20 61 6d 6f 75 6e  74 20 6f 66 20 6d 65 6d  |me amount of mem|
00000be0  6f 72 79 20 61 73 0d 74  77 6f 20 63 6f 6d 6d 6f  |ory as.two commo|
00000bf0  6e 6c 79 2d 75 73 65 64  20 6c 65 74 74 65 72 73  |nly-used letters|
00000c00  2c 20 69 2e 65 2e 20 74  77 6f 20 6e 69 62 62 6c  |, i.e. two nibbl|
00000c10  65 73 2e 0d 20 20 20 54  68 65 20 74 6f 6b 65 6e  |es..   The token|
00000c20  20 65 78 70 61 6e 73 69  6f 6e 20 72 6f 75 74 69  | expansion routi|
00000c30  6e 65 2c 20 27 6f 75 74  74 6f 6b 27 2c 20 77 6f  |ne, 'outtok', wo|
00000c40  72 6b 73 20 6f 75 74 20  74 68 65 20 63 6f 72 72  |rks out the corr|
00000c50  65 63 74 20 70 6f 73 69  74 69 6f 6e 0d 69 6e 20  |ect position.in |
00000c60  27 74 6f 6b 74 61 62 27  20 66 72 6f 6d 20 77 68  |'toktab' from wh|
00000c70  69 63 68 20 74 6f 20 62  65 67 69 6e 20 70 72 69  |ich to begin pri|
00000c80  6e 74 69 6e 67 2e 20 54  68 65 20 6c 65 6e 67 74  |nting. The lengt|
00000c90  68 73 20 6f 66 20 61 6e  79 20 74 6f 6b 65 6e 73  |hs of any tokens|
00000ca0  20 0d 70 72 65 63 65 65  64 69 6e 67 20 74 68 65  | .preceeding the|
00000cb0  20 74 6f 6b 65 6e 20 69  6e 20 71 75 65 73 74 69  | token in questi|
00000cc0  6f 6e 20 61 72 65 20 61  64 64 65 64 20 74 6f 20  |on are added to |
00000cd0  74 68 65 20 61 64 64 72  65 73 73 20 69 6e 20 27  |the address in '|
00000ce0  66 72 6f 6d 27 2e 20 0d  41 67 61 69 6e 2c 20 74  |from'. .Again, t|
00000cf0  68 65 20 74 6f 6b 65 6e  20 6c 65 6e 67 74 68 20  |he token length |
00000d00  69 73 20 6c 6f 61 64 69  6e 67 20 69 6e 74 6f 20  |is loading into |
00000d10  58 20 61 6e 64 20 75 73  65 64 20 61 73 20 61 20  |X and used as a |
00000d20  63 6f 75 6e 74 65 72 20  66 6f 72 20 68 6f 77 0d  |counter for how.|
00000d30  6d 61 6e 79 20 63 68 61  72 61 63 74 65 72 73 20  |many characters |
00000d40  66 72 6f 6d 20 74 68 65  20 74 61 62 6c 65 20 73  |from the table s|
00000d50  68 6f 75 6c 64 20 62 65  20 73 65 6e 74 20 74 6f  |hould be sent to|
00000d60  20 4f 53 57 52 43 48 2e  0d 20 20 20 54 6f 20 73  | OSWRCH..   To s|
00000d70  65 65 20 77 68 61 74 20  65 66 66 65 63 74 20 74  |ee what effect t|
00000d80  68 69 73 20 68 61 73 20  6f 6e 20 74 68 65 20 63  |his has on the c|
00000d90  6f 6d 70 72 65 73 73 69  6f 6e 20 66 61 63 74 6f  |ompression facto|
00000da0  72 2c 20 74 79 70 65 20  69 6e 20 22 73 68 65 0d  |r, type in "she.|
00000db0  73 65 6c 6c 73 20 73 65  61 20 73 68 65 6c 6c 73  |sells sea shells|
00000dc0  20 6f 6e 20 74 68 65 20  73 65 61 20 73 68 6f 72  | on the sea shor|
00000dd0  65 22 2c 20 61 6e 64 20  63 6f 6d 70 61 72 65 20  |e", and compare |
00000de0  74 68 65 20 72 65 73 75  6c 74 20 77 69 74 68 20  |the result with |
00000df0  6c 61 73 74 0d 74 69 6d  65 3a 20 35 34 25 20 69  |last.time: 54% i|
00000e00  73 20 71 75 69 74 65 20  69 6d 70 72 65 73 73 69  |s quite impressi|
00000e10  76 65 2e 20 48 61 76 69  6e 67 20 73 61 69 64 20  |ve. Having said |
00000e20  74 68 61 74 2c 20 74 68  65 20 65 6e 74 69 72 65  |that, the entire|
00000e30  20 63 6f 6d 70 72 65 73  73 69 6f 6e 0d 70 72 6f  | compression.pro|
00000e40  67 72 61 6d 20 73 75 66  66 65 72 73 20 66 72 6f  |gram suffers fro|
00000e50  6d 20 61 20 6c 69 6d 69  74 65 64 20 74 6f 6b 65  |m a limited toke|
00000e60  6e 20 74 61 62 6c 65 2c  20 61 6e 64 20 6f 6e 6c  |n table, and onl|
00000e70  79 20 6c 6f 77 65 72 2d  63 61 73 65 20 6c 65 74  |y lower-case let|
00000e80  74 65 72 73 0d 61 72 65  20 61 63 63 65 70 74 65  |ters.are accepte|
00000e90  64 2e 20 59 6f 75 20 63  6f 75 6c 64 20 69 6d 70  |d. You could imp|
00000ea0  72 6f 76 65 20 6f 6e 20  74 68 69 73 20 62 79 20  |rove on this by |
00000eb0  61 73 73 75 6d 69 6e 67  20 74 68 61 74 20 74 68  |assuming that th|
00000ec0  65 20 63 68 61 72 61 63  74 65 72 0d 61 66 74 65  |e character.afte|
00000ed0  72 20 61 20 66 75 6c 6c  20 73 74 6f 70 20 61 6e  |r a full stop an|
00000ee0  64 20 73 70 61 63 65 20  69 73 20 61 20 63 61 70  |d space is a cap|
00000ef0  69 74 61 6c 20 6c 65 74  74 65 72 2e 20 41 20 66  |ital letter. A f|
00000f00  65 77 20 6d 6f 72 65 20  73 70 61 63 65 73 20 63  |ew more spaces c|
00000f10  6f 75 6c 64 0d 62 65 20  73 71 75 65 65 7a 65 64  |ould.be squeezed|
00000f20  20 6f 75 74 20 62 79 20  61 73 73 75 6d 69 6e 67  | out by assuming|
00000f30  20 74 68 61 74 20 61 6c  6c 20 70 75 6e 63 74 75  | that all punctu|
00000f40  61 74 69 6f 6e 20 6d 61  72 6b 73 20 61 72 65 20  |ation marks are |
00000f50  66 6f 6c 6c 6f 77 65 64  20 62 79 20 61 0d 73 70  |followed by a.sp|
00000f60  61 63 65 2e 20 59 6f 75  20 61 6c 73 6f 20 63 6f  |ace. You also co|
00000f70  75 6c 64 20 61 73 73 69  67 6e 20 61 20 6e 69 62  |uld assign a nib|
00000f80  62 6c 65 20 76 61 6c 75  65 20 6f 66 20 33 20 74  |ble value of 3 t|
00000f90  6f 20 6d 65 61 6e 20 74  68 61 74 20 74 68 65 0d  |o mean that the.|
00000fa0  66 6f 6c 6c 6f 77 69 6e  67 20 74 77 6f 20 6e 69  |following two ni|
00000fb0  62 62 6c 65 73 20 69 6e  64 65 78 20 69 6e 74 6f  |bbles index into|
00000fc0  20 61 20 32 35 36 2d 77  6f 72 64 20 74 6f 6b 65  | a 256-word toke|
00000fd0  6e 20 74 61 62 6c 65 2e  0d 20 20 20 20 20 20 20  |n table..       |
00000fe0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00001000  20 20 20 20 43 68 72 69  73 74 6f 70 68 65 72 20  |    Christopher |
00001010  44 65 77 68 75 72 73 74  2c 20 45 55 47 20 23 35  |Dewhurst, EUG #5|
00001020  33 0d                                             |3.|
00001022
V/+COMPR2.m0
V/+COMPR2.m1
V/+COMPR2.m2
V/+COMPR2.m4
V/+COMPR2.m5