Skip to content

Decoding Atom Streams

Atom streams are group of one or more atoms (and their associated details). Atom streams are often transmitted across the network, though you will also find them in places such as the main.idx file.

Overview of Atom Streams

The actual atom stream is just a sequence of atoms in a binary format. Thus, from a conceptual perspective, an atom stream looks like:

<atom><atom><atom> ...

An atom in an atom stream has three primary fields:

protocol_num
The protocol number used to uniquely identify the atom.

Note

The protocol number field is also called the protocol id. However the documentation on this site uses protocol number (or protocol_num) to remain consistent with atom_num.

atom_num
The numeric value that when combined with protocol_num uniquely identifies an atom.
args
Data used when performing the operation a given atom represents.

What makes parsing atom streams somewhat tricky are the different formats (called styles) that are used to represent the various fields. The style field precedes each atom and describes how the atom information is represented.

Note

Across the various pieces of documentation describing atoms, the word atom is used to mean both the numeric identifier for an atom, and the various fields related to a single atom in an atom stream (sometimes including the args field, sometimes not). To avoid confusion, this page and related ones, use atom_num to mean the numeric identifier, the word atom (unstyled) to refer to the various fields that related to a single atom excluding the args field, and args (styled) to refer to data (if any) used when processing an atom.

Describing Byte:Bit Offsets

In the sections that follow, the offsets of fields are described using a byte:bit notation that is based on how bytes are processed when parsing atom streams. Specifically byte offsets increase from left to right, but bit offsets increase from right to left. Here is an example of two bytes with the bits individually numbered:

     ┌───────────────┬───────────────┐
Byte │       0       │       1       │
     └───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 

To demonstrate how field offsets are described, assume there are four fields (called field_a, field_b, field_c, and field_d respectively) encoded across two bytes. If they are represented graphically as:

     ┌───────────────┬───────────────┐
Byte │       0       │       1       │
     └───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴───────┴───────┘
        ▲       ▲        ▲       ▲    
        │       │        │       │    
     field_a    │     field_c    │    
                │                │    
            field_b           field_d

Then the byte:bit offsets would be described as:

Byte:Bit Offset Size Field
00:05 3 bits field_a
00:00 5 bits field_b
01:04 4 bits field_c
02:00 4 bits field_d

Notice how field_a starts at bit offset 5 in byte offset 0 (the first byte) and is 3 bits long. This means field_a is in the bits 5, 6, and 7.

Encoding Styles

The style field is the first field in every encoded atom, and is always the first three bits (the three high bits) of the first byte. The different values are:

Value Encoding Style
0 Full
1 Length
2 Data
3 Atom
4 Current
5 Zero
6 One
7 Prefix

Note

The atom, current, zero, and one encoding styles do not include a protocol_num field. Instead the protocol_num field for the previously processed atom is used. This is sometimes called the stream protocol (stream_protocol_num).

Full Style (0)

The full encoding style contains descriptions of all the atom information in the stream itself (no fields are reused from previous atoms, all fields are present). It is also the most flexible in terms of size of the args field.

In addition to the style, protocol_num, atom_num, and args fields, atoms encoded in the full style contain two additional fields.

args_len
This field describes the length of the args field. The atom has no arguments if this field is 0
sizeof_args_len
A single bit which determines the length of the args_len field. If this bit is 0 Then args_len is 7 bits. If this bit is 1 then args_len is 15 bits.

The first four fields of an atom in full style are:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits protocol_num
01:00 8 bits atom_num
02:07 1 bit sizeof_args_len

If the sizeof_args_len field is 0, then the rest of the fields are:

Byte:Bit Offset Size Field
02:06 7 bits args_len

If the sizeof_args_len field is 1 then the args_len field extends to include the fourth byte.

Byte:Bit Offset Size Field
02:06 15 bits args_len

Graphically:

If sizeof_args_len == 0

     ┌───────────────┬───────────────┬───────────────┐
Byte │       0       │       1       │       2       │
     └───────────────┴───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴───────────────┴─┴─────────────┘
        ▲       ▲            ▲        ▲       ▲       
        │       │            │        │       │       
      style     │        atom_num     │   args_len    
                │                     │               
          protocol_num         sizeof_args_len             


If sizeof_args_len == 1

     ┌───────────────┬───────────────┬───────────────┬───────────────┐
Byte │       0       │       1       │       2       │       3       │
     └───────────────┴───────────────┴───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴───────────────┴─┴─────────────────────────────┘
        ▲       ▲            ▲        ▲               ▲               
        │       │            │        │               │               
      style     │        atom_num     │           args_len            
                │                     │                               
          protocol_num         sizeof_args_len                        

Length Style (1)

Atoms encoded using length style have an args_len field similar to atoms encoded using full style. The difference is that atoms encoded using length style have an args_len field that is always 3 bits long. The fields are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits protocol_num
01:05 3 bits args_len
01:00 5 bits atom_num

Graphically

     ┌───────────────┬───────────────┐
Byte │       0       │       1       │
     └───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴─────┴─────────┘
        ▲       ▲       ▲       ▲     
        │       │       │       │     
      style     │   args_len    │     
                │               │     
          protocol_num      atom_num

Data Style (2)

With an atom encoded in data style, the args and atom_num fields are stored together in a single byte, which is the second byte of the atom.

The fields are laid out as:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits protocol_num
01:05 3 bits args
01:00 5 bits atom_num

Graphically it looks like:

     ┌───────────────┬───────────────┐
Byte │       0       │       1       │
     └───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴─────┴─────────┘
        ▲       ▲       ▲       ▲     
        │       │       │       │     
      style     │     args      │     
                │               │     
          protocol_num      atom_num

Atom Style (3)

Atoms encoded using the atom style only have two fields: style and atom_num, encoded in a single byte. The protocol_num field used is the stream_protocol_num (the same as the previously processed atom). Atoms encoded in this style have no arguments.

The two fields are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits atom_num

Graphically:

     ┌───────────────┐
Byte │       0       │
     └───────────────┘
Bit   7 6 5 4 3 2 1 0 
     └─────┴─────────┘
        ▲       ▲     
        │       │     
      style     │     
                │     
            atom_num  

Current Style (4)

Atoms encoded using the current style have three fields encoded across two bytes: style, atom_num, and args_len. Similar to atom style encoding, the stream_protocol_num field is used in place of protocol_num.

The fields are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits atom_num
01:00 8 bits args_len

Graphically:

     ┌───────────────┬───────────────┐
Byte │       0       │       1       │
     └───────────────┴───────────────┘
Bit   7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 
     └─────┴─────────┴───────────────┘
        ▲       ▲            ▲        
        │       │            │        
      style     │        args_len     
                │                     
            atom_num

Zero Style (5)

Atoms encoded using the zero style have two fields encoded in a single byte: style and atom_num. The stream_protocol_num field is used for the value of protocol_num. When the atom is processed, the args field is the value 0x00

The fields are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits atom_num

Graphically:

     ┌───────────────┐
Byte │       0       │
     └───────────────┘
Bit   7 6 5 4 3 2 1 0 
     └─────┴─────────┘
        ▲       ▲     
        │       │     
      style     │     
                │     
            atom_num  

One Style (6)

Atoms encoded using the one style are similar to atoms encoded using the zero style. Specifically two fields (style and atom_num) are encoded in a single byte. The stream_protocol_num value is used for protocol_num. The difference from zero style is that when an atom encoded with one style is processed, the value for args is 0x01

The fields are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:00 5 bits atom_num

Graphically:

     ┌───────────────┐
Byte │       0       │
     └───────────────┘
Bit   7 6 5 4 3 2 1 0 
     └─────┴─────────┘
        ▲       ▲     
        │       │     
      style     │     
                │     
            atom_num  

Prefix Style (7)

Prefix style is different from the other styles in that it does not encode an atom. Instead, it encodes fields that are used to modify the protocol_num and atom_num fields when decoding atoms that follow a prefix style encoding.

Prefix style encodes the following fields:

protocol_offset
This is a value that is bitwise-OR'd with the protocol_num field to compute the protocol_num used for processing.
atom_offset
This is a value that is bitwise-OR'd with the atom_num field to compute the atom_num used for processing.
keep_prefix

This determines whether the protocol_offset and atom_offset fields should be used for computing the protocol_num and atom_num fields after the next atom is processed.

Thus the protocol_offset and atom_offset fields are always applied to the atom that immediate follows a prefix style byte. The keep_prefix field determines if the protocol_offset and atom_offset should be applied to atoms after the one that immediately follows the prefix byte. In these cases, the keep_prefix remains set until another prefix style encoded byte has the value 0.

The fields in prefix style are encoded in a single byte, and are laid out as follows:

Byte:Bit Offset Size Field
00:05 3 bits style
00:03 2 bits protocol_offset
00:01 2 bits atom_offset
00:00 1 bit keep_prefix

Graphically:

     ┌───────────────┐            
Byte │       0       │            
     └───────────────┘            
Bit   7 6 5 4 3 2 1 0             
     └─────┴───┴───┴─┘            
        ▲    ▲   ▲  ▲             
        │    │   │  │             
      style  │   │  └─ keep_prefix
             │   │                
             │   └─ atom_offset   
             │                    
      protocol_offset

The offset fields can be extracted using bitmasks. Unlike other fields (e.g. style), a bitmask determines which bits to keep, and which bits to clear. The bitmask to extra the protocol_offset field is 0x18 (00011000 in binary). The bitmask for the atom_offset field is 0x06 (00000110 in binary).

Once the appropriate bits have been isolated via masking, the offset values are left-shifted. The protocol_offset value is shifted to the left by 2 bits, and the atom_offset value is shifted to the left by 4 bits.

For example if the value 0xEE is encoded using prefix style, the final protocol_offset value is 0x20 and the final atom_offset value is 0x60.

To see this graphically, consider how the bits are laid out:

     ┌───────────────┐            
Byte │     0xEE      │            
     └───────────────┘            
Bit   1 1 1 0 1 1 1 0             
     └─────┴───┴───┴─┘            
        ▲    ▲   ▲  ▲             
        │    │   │  │             
      style  │   │  └─ keep_prefix
             │   │                
             │   └─ atom_offset   
             │                    
      protocol_offset

To compute the protocol_offset value, apply the bitmask to isolate the relevant bits. This yields the value 0x08. Graphically:

              ┌───────────────┐
         Byte │     0xEE      │
              └───────────────┘
         Bit   1 1 1 0 1 1 1 0 
               ─────────────── 
  Mask (0x18)  0 0 0 1 1 0 0 0 
               ─────────────── 
Result (0x08)  0 0 0 0 1 0 0 0 

Then left-shifting by two bits to get 0x20:

      Result (0x08)  0 0 0 0 1 0 0 0 
                     ─────────────── 
Left-shift 2 (0x20)  0 0 1 0 0 0 0 0 

The same general procedure follows for the atom_offset field. First apply the bitmask to isolate the relevant bits, which yields 0x06:

              ┌───────────────┐
         Byte │     0xEE      │
              └───────────────┘
         Bit   1 1 1 0 1 1 1 0 
               ─────────────── 
  Mask (0x06)  0 0 0 0 0 1 1 0 
               ─────────────── 
Result (0x06)  0 0 0 0 0 1 1 0 

Next left-shifting by four bits to get 0x60:

      Result (0x06)  0 0 0 0 0 1 1 0 
                     ─────────────── 
Left-shift 4 (0x60)  0 1 1 0 0 0 0 0 

Atom Arguments

Many atoms can have arguments that are used when performing their associated action. This section summaries if, and where, the args field resides for each encoding style.

With thefull, length, and current encoding styles, if the args_len field is not 0, then the atom has arguments. The arguments are encoded as a series of bytes (of length args_len) and immediately follow the last field of the style.

The data encoding style always has a single argument, since the argument is encoded in the same byte that contains atom_num.

The zero and one styles do not have any arguments in the stream. Instead they always use the values 0 and 1 respectively, when processing the associated atom.

Neither the atom nor prefix styles have any arguments associated with them.

This table summarizes the args_len and args fields for the different encoding styles:

Style Has args_len args In Stream Has args
full Yes Maybe If args_len != 0
length Yes Maybe If args_len != 0
data No Yes Yes
atom No No No
current Yes Maybe If args_len != 0
zero No No Yes, implied 0
one No No Yes, implied 1
prefix N/A N/A Not Applicable

Atom Streams in P3 Packets

P3 packets can contain atom streams. Specifically, a P3 packet has an atom stream if the following are true:

  1. The Packet is a DATA (type: 0x20) packet
  2. The first character of the token is not x, T, or F.
  3. The token is not one of the following: DD, D3, D6, dp, Dp, XS, eI, eJ, eX, fD, OT, AA, AB, AC, AD, CA, CB.
  4. The packet contains enough data for an atom (i.e. it contains more than just a token and stream identifier).

A P3 packet with an atom stream is composed of a token (token), followed by a stream identifier (s_id), and then one or more atoms.

<token><s_id><atom><atom><atom>...

The token field is always two bytes long. The size of the stream identifier varies depending on the token.

Stream Identifiers

The stream identifier field (s_id) is used to differentiate between multiple atom streams For example, different forms running at the same time may have different stream identifiers.

As mentioned, the s_id field immediately follows the token field. The length of s_id depends on the value of the token field:

Token Length of Stream Id
at 4 bytes
At 3 bytes
Neither at nor At 2 bytes

Examples

Here are some examples of packets with stream identifiers of differing sizes. First an AT token with a 2-byte s_id.

token (AT) ─┐     ┌─ s_id (2 bytes)                      
            ▼     ▼                                      
         ┌─────┬─────┐                                   
    0000: 41 54 00 C1 00 06 0E 31 3A 31 37 30 36 33 3A 36
    0010: 37 35 39 37 38 01 5A 17 61 6F 6C 3A 2F 2F 34 34
    0020: 30 31 3A 31 37 30 36 33 3A 36 37 35 39 37 38 72
    0030: 0D                                             

This is an example of an At packet, which has a 3-byte stream_id.

token (At) ─┐      ┌─── s_id (3 bytes)                    
            ▼      ▼                                      
         ┌─────┬────────┐                                 
    0000: 41 74 04 0F 23 20 01 41 29 2C A1 00 00 00 00 00 
    0010: 80 05 00 00 00 00 C1 21 1D 20 02 0D             

Finally, an example of an at packet, which has a 4-byte stream_id.

token (at) ─┐        ┌─ s_id (4 bytes)                   
            ▼        ▼                                   
         ┌─────┬───────────┐                             
    0000: 61 74 01 10 0F A0 20 01 25 85 14 FF 00 19 2F 54
    0010: 01 02 24 20 9F 86 02 53 4E 68 40 27 E2 22 95 14
    0020: FF 00 19 20 0C 2F 54 03 04 24 20 9F 86 02 53 4E
    0030: 68 40 67 24 20 9F 86 02 53 6E 20 15 20 4B 04 02
    0040: 24 08 40 87 40 47 20 02 20 02 0D