Decoding Atom Streams
Atom streams are group of one or more atoms (and their associated details). Atom streams are often transmitted across the network, though you will also find them in places such as the main.idx file.
Overview of Atom Streams
The actual atom stream is just a sequence of atoms in a binary format. Thus, from a conceptual perspective, an atom stream looks like:
An atom in an atom stream has three primary fields:
protocol_num
- The protocol number used to uniquely identify the atom.
Note
The protocol number field is also called the protocol id. However the
documentation on this site uses protocol number (or protocol_num
) to
remain consistent with atom_num
.
atom_num
- The numeric value that when combined with
protocol_num
uniquely identifies an atom. args
- Data used when performing the operation a given atom represents.
What makes parsing atom streams somewhat tricky are the different formats
(called styles) that are used to represent the various fields. The style
field precedes each atom and describes how the atom information is represented.
Note
Across the various pieces of documentation describing atoms, the word
atom is used to mean both the numeric identifier for an atom, and the
various fields related to a single atom in an atom stream (sometimes
including the args
field, sometimes not). To avoid confusion, this page
and related ones, use atom_num
to mean the numeric identifier, the word
atom (unstyled) to refer to the various fields that related to a single
atom excluding the args
field, and args
(styled) to refer to data (if
any) used when processing an atom.
Describing Byte:Bit Offsets
In the sections that follow, the offsets of fields are described using a
byte:bit
notation that is based on how bytes are processed when
parsing atom streams. Specifically byte offsets increase from left to
right, but bit offsets increase from right to left. Here is an
example of two bytes with the bits individually numbered:
┌───────────────┬───────────────┐ Byte │ 0 │ 1 │ └───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
To demonstrate how field offsets are described, assume there are four fields
(called field_a
, field_b
, field_c
, and field_d
respectively) encoded
across two bytes. If they are represented graphically as:
┌───────────────┬───────────────┐ Byte │ 0 │ 1 │ └───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴───────┴───────┘ ▲ ▲ ▲ ▲ │ │ │ │ field_a │ field_c │ │ │ field_b field_d
Then the byte:bit
offsets would be described as:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | field_a |
00:00 | 5 bits | field_b |
01:04 | 4 bits | field_c |
02:00 | 4 bits | field_d |
Notice how field_a
starts at bit offset 5
in byte offset 0
(the first byte) and is 3
bits long. This means field_a
is in the
bits 5
, 6
, and 7
.
Encoding Styles
The style
field is the first field in every encoded atom, and is always
the first three bits (the three high bits) of the first byte. The different
values are:
Value | Encoding Style |
---|---|
0 | Full |
1 | Length |
2 | Data |
3 | Atom |
4 | Current |
5 | Zero |
6 | One |
7 | Prefix |
Note
The atom, current, zero, and one encoding styles do not include a
protocol_num
field. Instead the protocol_num
field for the previously
processed atom is used. This is sometimes called the stream protocol
(stream_protocol_num
).
Full Style (0
)
The full encoding style contains descriptions of all the atom information in
the stream itself (no fields are reused from previous atoms, all fields are
present). It is also the most flexible in terms of size of the args
field.
In addition to the style
, protocol_num
, atom_num
, and args
fields,
atoms encoded in the full style contain two additional fields.
args_len
- This field describes the length of the
args
field. The atom has no arguments if this field is0
sizeof_args_len
- A single bit which determines the length of the
args_len
field. If this bit is0
Thenargs_len
is 7 bits. If this bit is1
thenargs_len
is 15 bits.
The first four fields of an atom in full style are:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | protocol_num |
01:00 | 8 bits | atom_num |
02:07 | 1 bit | sizeof_args_len |
If the sizeof_args_len
field is 0
, then the rest of the fields are:
Byte:Bit Offset | Size | Field |
---|---|---|
02:06 | 7 bits | args_len |
If the sizeof_args_len
field is 1
then the args_len
field extends
to include the fourth byte.
Byte:Bit Offset | Size | Field |
---|---|---|
02:06 | 15 bits | args_len |
Graphically:
If sizeof_args_len == 0 ┌───────────────┬───────────────┬───────────────┐ Byte │ 0 │ 1 │ 2 │ └───────────────┴───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴───────────────┴─┴─────────────┘ ▲ ▲ ▲ ▲ ▲ │ │ │ │ │ style │ atom_num │ args_len │ │ protocol_num sizeof_args_len If sizeof_args_len == 1 ┌───────────────┬───────────────┬───────────────┬───────────────┐ Byte │ 0 │ 1 │ 2 │ 3 │ └───────────────┴───────────────┴───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴───────────────┴─┴─────────────────────────────┘ ▲ ▲ ▲ ▲ ▲ │ │ │ │ │ style │ atom_num │ args_len │ │ protocol_num sizeof_args_len
Length Style (1
)
Atoms encoded using length style have an args_len
field similar to atoms
encoded using full style. The difference is that atoms encoded using length
style have an args_len
field that is always 3
bits long. The fields
are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | protocol_num |
01:05 | 3 bits | args_len |
01:00 | 5 bits | atom_num |
Graphically
┌───────────────┬───────────────┐ Byte │ 0 │ 1 │ └───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴─────┴─────────┘ ▲ ▲ ▲ ▲ │ │ │ │ style │ args_len │ │ │ protocol_num atom_num
Data Style (2
)
With an atom encoded in data style, the args
and atom_num
fields are
stored together in a single byte, which is the second byte of the atom.
The fields are laid out as:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | protocol_num |
01:05 | 3 bits | args |
01:00 | 5 bits | atom_num |
Graphically it looks like:
┌───────────────┬───────────────┐ Byte │ 0 │ 1 │ └───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴─────┴─────────┘ ▲ ▲ ▲ ▲ │ │ │ │ style │ args │ │ │ protocol_num atom_num
Atom Style (3
)
Atoms encoded using the atom style only have two fields: style
and
atom_num
, encoded in a single byte. The protocol_num
field used is the
stream_protocol_num
(the same as the previously processed atom). Atoms
encoded in this style have no arguments.
The two fields are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | atom_num |
Graphically:
┌───────────────┐ Byte │ 0 │ └───────────────┘ Bit 7 6 5 4 3 2 1 0 └─────┴─────────┘ ▲ ▲ │ │ style │ │ atom_num
Current Style (4
)
Atoms encoded using the current style have three fields encoded across two
bytes: style
, atom_num
, and args_len
. Similar to atom style encoding, the
stream_protocol_num
field is used in place of protocol_num
.
The fields are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | atom_num |
01:00 | 8 bits | args_len |
Graphically:
┌───────────────┬───────────────┐ Byte │ 0 │ 1 │ └───────────────┴───────────────┘ Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 └─────┴─────────┴───────────────┘ ▲ ▲ ▲ │ │ │ style │ args_len │ atom_num
Zero Style (5
)
Atoms encoded using the zero style have two fields encoded in a single byte:
style
and atom_num
. The stream_protocol_num
field is used for the value
of protocol_num
. When the atom is processed, the args
field is the value
0x00
The fields are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | atom_num |
Graphically:
┌───────────────┐ Byte │ 0 │ └───────────────┘ Bit 7 6 5 4 3 2 1 0 └─────┴─────────┘ ▲ ▲ │ │ style │ │ atom_num
One Style (6
)
Atoms encoded using the one style are similar to atoms encoded using the zero
style. Specifically two fields (style
and atom_num
) are encoded in a single
byte. The stream_protocol_num
value is used for protocol_num
. The
difference from zero style is that when an atom encoded with one style is
processed, the value for args
is 0x01
The fields are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:00 | 5 bits | atom_num |
Graphically:
┌───────────────┐ Byte │ 0 │ └───────────────┘ Bit 7 6 5 4 3 2 1 0 └─────┴─────────┘ ▲ ▲ │ │ style │ │ atom_num
Prefix Style (7
)
Prefix style is different from the other styles in that it does not encode
an atom. Instead, it encodes fields that are used to modify the protocol_num
and atom_num
fields when decoding atoms that follow a prefix style encoding.
Prefix style encodes the following fields:
protocol_offset
- This is a value that is bitwise-OR'd with the
protocol_num
field to compute theprotocol_num
used for processing. atom_offset
- This is a value that is bitwise-OR'd with the
atom_num
field to compute theatom_num
used for processing. keep_prefix
-
This determines whether the
protocol_offset
andatom_offset
fields should be used for computing theprotocol_num
andatom_num
fields after the next atom is processed.Thus the
protocol_offset
andatom_offset
fields are always applied to the atom that immediate follows a prefix style byte. Thekeep_prefix
field determines if theprotocol_offset
andatom_offset
should be applied to atoms after the one that immediately follows the prefix byte. In these cases, thekeep_prefix
remains set until another prefix style encoded byte has the value0
.
The fields in prefix style are encoded in a single byte, and are laid out as follows:
Byte:Bit Offset | Size | Field |
---|---|---|
00:05 | 3 bits | style |
00:03 | 2 bits | protocol_offset |
00:01 | 2 bits | atom_offset |
00:00 | 1 bit | keep_prefix |
Graphically:
┌───────────────┐ Byte │ 0 │ └───────────────┘ Bit 7 6 5 4 3 2 1 0 └─────┴───┴───┴─┘ ▲ ▲ ▲ ▲ │ │ │ │ style │ │ └─ keep_prefix │ │ │ └─ atom_offset │ protocol_offset
The offset fields can be extracted using bitmasks. Unlike other fields (e.g.
style
), a bitmask determines which bits to keep, and which bits to clear. The
bitmask to extra the protocol_offset
field is 0x18
(00011000
in binary). The bitmask for the atom_offset
field is
0x06
(00000110
in binary).
Once the appropriate bits have been isolated via masking, the offset values are
left-shifted. The protocol_offset
value is shifted to the left by
2
bits, and the atom_offset
value is shifted to the left by
4
bits.
For example if the value 0xEE
is encoded using prefix style, the
final protocol_offset
value is 0x20
and the final atom_offset
value is 0x60
.
To see this graphically, consider how the bits are laid out:
┌───────────────┐ Byte │ 0xEE │ └───────────────┘ Bit 1 1 1 0 1 1 1 0 └─────┴───┴───┴─┘ ▲ ▲ ▲ ▲ │ │ │ │ style │ │ └─ keep_prefix │ │ │ └─ atom_offset │ protocol_offset
To compute the protocol_offset
value, apply the bitmask to isolate the
relevant bits. This yields the value 0x08
. Graphically:
┌───────────────┐ Byte │ 0xEE │ └───────────────┘ Bit 1 1 1 0 1 1 1 0 ─────────────── Mask (0x18) 0 0 0 1 1 0 0 0 ─────────────── Result (0x08) 0 0 0 0 1 0 0 0
Then left-shifting by two bits to get 0x20
:
Result (0x08) 0 0 0 0 1 0 0 0 ─────────────── Left-shift 2 (0x20) 0 0 1 0 0 0 0 0
The same general procedure follows for the atom_offset
field. First apply the
bitmask to isolate the relevant bits, which yields 0x06
:
┌───────────────┐ Byte │ 0xEE │ └───────────────┘ Bit 1 1 1 0 1 1 1 0 ─────────────── Mask (0x06) 0 0 0 0 0 1 1 0 ─────────────── Result (0x06) 0 0 0 0 0 1 1 0
Next left-shifting by four bits to get 0x60
:
Result (0x06) 0 0 0 0 0 1 1 0 ─────────────── Left-shift 4 (0x60) 0 1 1 0 0 0 0 0
Atom Arguments
Many atoms can have arguments that are used when performing their associated
action. This section summaries if, and where, the args
field resides for each
encoding style.
With thefull
, length
, and current
encoding styles, if the args_len
field is not 0
, then the atom has arguments. The arguments are encoded
as a series of bytes (of length args_len
) and immediately follow the last
field of the style.
The data
encoding style always has a single argument, since the argument is
encoded in the same byte that contains atom_num
.
The zero
and one
styles do not have any arguments in the stream. Instead
they always use the values 0
and 1
respectively, when
processing the associated atom.
Neither the atom
nor prefix
styles have any arguments associated with them.
This table summarizes the args_len
and args
fields for the different
encoding styles:
Style | Has args_len |
args In Stream |
Has args |
---|---|---|---|
full |
Yes | Maybe | If args_len != 0 |
length |
Yes | Maybe | If args_len != 0 |
data |
No | Yes | Yes |
atom |
No | No | No |
current |
Yes | Maybe | If args_len != 0 |
zero |
No | No | Yes, implied 0 |
one |
No | No | Yes, implied 1 |
prefix |
N/A | N/A | Not Applicable |
Atom Streams in P3 Packets
P3 packets can contain atom streams. Specifically, a P3 packet has an atom stream if the following are true:
- The Packet is a DATA (type:
0x20
) packet - The first character of the token is not
x
,T
, orF
. - The token is not one of the following:
DD
,D3
,D6
,dp
,Dp
,XS
,eI
,eJ
,eX
,fD
,OT
,AA
,AB
,AC
,AD
,CA
,CB
. - The packet contains enough data for an atom (i.e. it contains more than just a token and stream identifier).
A P3 packet with an atom stream is composed of a token (token
), followed by a
stream identifier (s_id
), and then one or more atoms.
The token
field is always two bytes long. The size of the stream identifier
varies depending on the token
.
Stream Identifiers
The stream identifier field (s_id
) is used to differentiate between multiple
atom streams For example, different forms running at the same time may have
different stream identifiers.
As mentioned, the s_id
field immediately follows the token
field.
The length of s_id
depends on the value of the token
field:
Token | Length of Stream Id |
---|---|
at |
4 bytes |
At |
3 bytes |
Neither at nor At |
2 bytes |
Examples
Here are some examples of packets with stream identifiers of differing sizes.
First an AT
token with a 2-byte s_id
.
token (AT) ─┐ ┌─ s_id (2 bytes) ▼ ▼ ┌─────┬─────┐ 0000: 41 54 00 C1 00 06 0E 31 3A 31 37 30 36 33 3A 36 0010: 37 35 39 37 38 01 5A 17 61 6F 6C 3A 2F 2F 34 34 0020: 30 31 3A 31 37 30 36 33 3A 36 37 35 39 37 38 72 0030: 0D
This is an example of an At
packet, which has a 3-byte stream_id
.
token (At) ─┐ ┌─── s_id (3 bytes) ▼ ▼ ┌─────┬────────┐ 0000: 41 74 04 0F 23 20 01 41 29 2C A1 00 00 00 00 00 0010: 80 05 00 00 00 00 C1 21 1D 20 02 0D
Finally, an example of an at
packet, which has a 4-byte stream_id
.
token (at) ─┐ ┌─ s_id (4 bytes) ▼ ▼ ┌─────┬───────────┐ 0000: 61 74 01 10 0F A0 20 01 25 85 14 FF 00 19 2F 54 0010: 01 02 24 20 9F 86 02 53 4E 68 40 27 E2 22 95 14 0020: FF 00 19 20 0C 2F 54 03 04 24 20 9F 86 02 53 4E 0030: 68 40 67 24 20 9F 86 02 53 6E 20 15 20 4B 04 02 0040: 24 08 40 87 40 47 20 02 20 02 0D