Latest (patched) version:
HTML:
http://yaml.org/spec/1.2/spec.html
PDF:
http://yaml.org/spec/1.2/spec.pdf
PS:
http://yaml.org/spec/1.2/spec.ps
Errata:
http://yaml.org/spec/1.2/errata.html
Previous (original) version:
http://yaml.org/spec/1.2/2009-07-21/spec.html
Copyright © 2001-2016 Oren Ben-Kiki, Clark Evans, Ingy döt Net
Status of this Document
This document reflects the third version of YAML data serialization language. The content of the specification was arrived at by consensus of its authors and through user feedback on the yaml-core mailing list. We encourage implementers to please update their software with support for this version.
The primary objective of this revision is to bring YAML into compliance
with JSON as an official subset. YAML 1.2 is compatible with 1.1 for
most practical applications - this is a minor revision. An expected
source of incompatibility with prior versions of YAML, especially the
syck implementation, is the change in implicit typing rules. We have
removed unique implicit typing rules and have updated these rules to
align them with JSON's productions. In this version of YAML, boolean
values may be serialized as “true
” or
“false
”; the empty scalar as “null
”.
Unquoted numeric values are a superset of JSON's numeric production.
Other changes in the specification were the removal of the Unicode line
breaks and production bug fixes. We also define 3 built-in implicit
typing rule sets: untyped, strict JSON, and a more flexible YAML rule
set that extends JSON typing.
The difference between late 1.0 drafts which syck 0.55 implements and the 1.1 revision of this specification is much more extensive. We fixed usability issues with the tagging syntax. In particular, the single exclamation was re-defined for private types and a simple prefixing mechanism was introduced. This revision also fixed many production edge cases and introduced a type repository. Therefore, there are several incompatibilities between syck and this revision as well.
The list of known errors in this specification is available at http://yaml.org/spec/1.2/errata.html. Please report errors in this document to the yaml-core mailing list. This revision contains fixes for all errors known as of 2016-12-19.
We wish to thank implementers who have tirelessly tracked earlier versions of this specification, and our fabulous user community whose feedback has both validated and clarified our direction.
Abstract
YAML™ (rhymes with “camel”) is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of agile programming languages. It is broadly useful for programming needs ranging from configuration files to Internet messaging to object persistence to data auditing. Together with the Unicode standard for characters, this specification provides all the information necessary to understand YAML Version 1.2 and to create programs that process YAML information.
Table of Contents
“YAML Ain’t Markup Language” (abbreviated YAML) is a data serialization language designed to be human-friendly and work well with modern programming languages for common everyday tasks. This specification is both an introduction to the YAML language and the concepts supporting it, and also a complete specification of the information needed to develop applications for processing YAML.
Open, interoperable and readily understandable tools have advanced computing immensely. YAML was designed from the start to be useful and friendly to people working with data. It uses Unicode printable characters, some of which provide structural information and the rest containing the data itself. YAML achieves a unique cleanness by minimizing the amount of structural characters and allowing the data to show itself in a natural and meaningful way. For example, indentation may be used for structure, colons separate key: value pairs, and dashes are used to create “bullet” lists.
There are myriad flavors of data structures, but they can all be adequately represented with three basic primitives: mappings (hashes/dictionaries), sequences (arrays/lists) and scalars (strings/numbers). YAML leverages these primitives, and adds a simple typing system and aliasing mechanism to form a complete language for serializing any native data structure. While most programming languages can use YAML for data serialization, YAML excels in working with those languages that are fundamentally built around the three basic primitives. These include the new wave of agile languages such as Perl, Python, PHP, Ruby, and Javascript.
There are hundreds of different languages for programming, but only a handful of languages for storing and transferring data. Even though its potential is virtually boundless, YAML was specifically created to work well for common use cases such as: configuration files, log files, interprocess messaging, cross-language data sharing, object persistence, and debugging of complex data structures. When data is easy to view and understand, programming becomes a simpler task.
The design goals for YAML are, in decreasing priority:
YAML’s initial direction was set by the data serialization and markup language discussions among SML-DEV members. Later on, it directly incorporated experience from Ingy döt Net’s Perl module Data::Denter. Since then, YAML has matured through ideas and support from its user community.
YAML integrates and builds upon concepts described by C, Java, Perl, Python, Ruby, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), XML, SAX, SOAP, and JSON.
The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with that standard. Further, borrowing from MIME (RFC2045), YAML’s top-level production is a stream of independent documents, ideal for message-based distributed processing systems.
YAML’s indentation-based scoping is similar to Python’s (without the ambiguities caused by tabs). Indented blocks facilitate easy inspection of the data’s structure. YAML’s literal style leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping. YAML also allows the use of traditional indicator-based scoping similar to JSON’s and Perl’s. Such flow content can be freely nested inside indented blocks.
YAML’s double-quoted style uses familiar
C-style escape sequences. This enables ASCII encoding of
non-printable or 8-bit
(ISO 8859-1) characters such as “\x3B
”. Non-printable 16-bit Unicode and
32-bit (ISO/IEC 10646) characters are supported with escape
sequences such as “\u003B
” and “\U0000003B
”.
Motivated by HTML’s end-of-line normalization, YAML’s line folding employs an intuitive method of handling line breaks. A single line break is folded into a single space, while empty lines are interpreted as line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the scalar content.
YAML’s core type system is based on the requirements of agile languages such as Perl, Python, and Ruby. YAML directly supports both collections (mappings, sequences) and scalars. Support for these common types enables programmers to use their language’s native data structures for YAML manipulation, instead of requiring a special document object model (DOM).
Like XML’s SOAP, YAML supports serializing a graph of native data structures through an aliasing mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to represent rich data structures required for modern distributed computing. YAML provides globally unique type names using a namespace mechanism inspired by Java’s DNS-based package naming convention and XML’s URI-based namespaces. In addition, YAML allows for private types specific to a single application.
YAML was designed to support incremental interfaces that include both
input (“getNextEvent()
”) and output
(“sendNextEvent()
”) one-pass interfaces. Together, these
enable YAML to support the processing of large documents (e.g. transaction logs) or
continuous streams (e.g. feeds from
a production machine).
Both JSON and YAML aim to be human readable data interchange formats. However, JSON and YAML have different priorities. JSON’s foremost design goal is simplicity and universality. Thus, JSON is trivial to generate and parse, at the cost of reduced human readability. It also uses a lowest common denominator information model, ensuring any JSON data can be easily processed by every modern programming environment.
In contrast, YAML’s foremost design goals are human readability and support for serializing arbitrary native data structures. Thus, YAML allows for extremely readable files, but is more complex to generate and parse. In addition, YAML ventures beyond the lowest common denominator data types, requiring more complex processing when crossing between different programming environments.
YAML can therefore be viewed as a natural superset of JSON, offering improved human readability and a more complete information model. This is also the case in practice; every JSON file is also a valid YAML file. This makes it easy to migrate from JSON to YAML if/when the additional features are required.
JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be. Technically, YAML therefore complies with the JSON spec, choosing to treat duplicates as an error. In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files.
It may be useful to define a intermediate format between YAML and JSON. Such a format would be trivial to parse (but not very human readable), like JSON. At the same time, it would allow for serializing arbitrary native data structures, like YAML. Such a format might also serve as YAML’s "canonical format". Defining such a “YSON” format (YSON is a Serialized Object Notation) can be done either by enhancing the JSON specification or by restricting the YAML specification. Such a definition is beyond the scope of this specification.
Newcomers to YAML often search for its correlation to the eXtensible Markup Language (XML). Although the two languages may actually compete in several application domains, there is no direct correlation between them.
YAML is primarily a data serialization language. XML was designed to be backwards compatible with the Standard Generalized Markup Language (SGML), which was designed to support structured documentation. XML therefore had many design constraints placed on it that YAML does not share. XML is a pioneer in many domains, YAML is the result of lessons learned from XML and other technologies.
It should be mentioned that there are ongoing efforts to define standard XML/YAML mappings. This generally requires that a subset of each language be used. For more information on using both XML and YAML, please visit http://yaml.org/xml.
This specification uses key words based on RFC2119 to indicate requirement level. In particular, the following words are used to describe the actions of a YAML processor:
The rest of this document is arranged as follows. Chapter 2 provides a short preview of the main YAML features. Chapter 3 describes the YAML information model, and the processes for converting from and to this model and the YAML text format. The bulk of the document, chapters 4 through 9, formally define this text format. Finally, chapter 10 recommends basic YAML schemas.
This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.
YAML’s block collections use indentation for scope
and begin each entry on its own line. Block sequences
indicate each entry with a dash and space ( “-
”). Mappings use a colon and
space (“:
”) to mark each key: value pair. Comments begin with an octothorpe (also
called a “hash”, “sharp”,
“pound”, or “number sign” - “#
”).
YAML also has flow styles, using explicit indicators rather than indentation to denote scope. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curly braces.
YAML uses three dashes (“---
”) to separate
directives from document content. This also serves to signal the
start of a document if no directives are present. Three dots (
“...
”) indicate the end of a document
without starting a new one, for use in communication channels.
Repeated nodes (objects) are first
identified
by an anchor (marked with the
ampersand - “&
”), and are then aliased (referenced with an
asterisk - “*
”) thereafter.
A question mark and space (“?
”) indicate a complex mapping key. Within a block collection,
key: value pairs can
start immediately following the dash, colon, or question
mark.
Scalar content can be written in
block notation,
using a literal style (indicated by “|
”) where all
line breaks are significant.
Alternatively, they can be written with the folded
style (denoted by
“>
”) where each line break is folded to a space
unless it ends an empty or a
more-indented line.
YAML’s flow scalars include the plain style (most examples thus far) and two quoted styles. The double-quoted style provides escape sequences. The single-quoted style is useful when escaping is not needed. All flow scalars can span multiple lines; line breaks are always folded.
In YAML, untagged
nodes are given a type depending on the application. The examples in this
specification generally use the
seq
,
map
and
str
types from the fail safe schema. A few
examples also use the int
, float
, and null
types from the
JSON schema. The
repository
includes additional types such as binary
, omap
,
set
and others.
Explicit typing is denoted with a tag
using the exclamation point (“!
”) symbol. Global tags are URIs and may be specified
in a tag
shorthand notation using a handle. Application-specific local tags may also be used.
Below are two full-length examples of YAML. On the left is a sample invoice; on the right is a sample log file.
YAML is both a text format and a method for presenting any native data structure in this format. Therefore, this specification defines two concepts: a class of data objects called YAML representations, and a syntax for presenting YAML representations as a series of characters, called a YAML stream. A YAML processor is a tool for converting information between these complementary views. It is assumed that a YAML processor does its work on behalf of another module, called an application. This chapter describes the information structures a YAML processor must provide to or obtain from the application.
YAML information is used in two ways: for machine processing, and for human consumption. The challenge of reconciling these two perspectives is best done in three distinct translation stages: representation, serialization, and presentation. Representation addresses how YAML views native data structures to achieve portability between programming environments. Serialization concerns itself with turning a YAML representation into a serial form, that is, a form with sequential access constraints. Presentation deals with the formatting of a YAML serialization as a series of characters in a human-friendly manner.
Translating between native data structures and a character stream is done in several logically distinct stages, each with a well defined input and output data model, as shown in the following diagram:
A YAML processor need not expose the serialization or representation stages. It may translate directly between native data structures and a character stream (dump and load in the diagram above). However, such a direct translation should take place so that the native data structures are constructed only from information available in the representation. In particular, mapping key order, comments, and tag handles should not be referenced during composition.
Dumping native data structures to a character stream is done using the following three stages:
YAML represents any native data structure using three node kinds: sequence - an ordered series of entries; mapping - an unordered association of unique keys to values; and scalar - any datum with opaque structure presentable as a series of Unicode characters. Combined, these primitives generate directed graph structures. These primitives were chosen because they are both powerful and familiar: the sequence corresponds to a Perl array and a Python list, the mapping corresponds to a Perl hash table and a Python dictionary. The scalar represents strings, integers, dates, and other atomic data types.
Each YAML node requires, in
addition to its kind and
content, a tag specifying its data type. Type
specifiers are either global URIs, or are local in scope to a
single application.
For example, an integer is represented in YAML with a scalar plus the global tag
“tag:yaml.org,2002:int
”. Similarly, an invoice
object, particular to a given organization, could be
represented as a mapping
together with the local tag
“!invoice
”. This simple model can represent any
data structure independent of programming language.
Loading native data structures from a character stream is done using the following three stages:
This section specifies the formal details of the results of the above processes. To maximize data portability between programming languages and implementations, users of YAML should be mindful of the distinction between serialization or presentation properties and those which are part of the YAML representation. Thus, while imposing a order on mapping keys is necessary for flattening YAML representations to a sequential access medium, this serialization detail must not be used to convey application level information. In a similar manner, while indentation technique and a choice of a node style are needed for the human readability, these presentation details are neither part of the YAML serialization nor the YAML representation. By carefully separating properties needed for serialization and presentation, YAML representations of application information will be consistent and portable between various programming environments.
The following diagram summarizes the three information models. Full arrows
denote composition, hollow arrows denote inheritance,
“1
” and “*
” denote “one” and
“many” relationships. A single “+
” denotes
serialization details, a
double “++
” denotes presentation details.
YAML’s representation of native data structure is a rooted, connected, directed graph of tagged nodes. By “directed graph” we mean a set of nodes and directed edges (“arrows”), where each edge connects one node to another (see a formal definition). All the nodes must be reachable from the root node via such edges. Note that the YAML graph may include cycles, and a node may have more than one incoming edge.
Nodes that are defined in terms of other nodes are collections; nodes that are independent of any other nodes are scalars. YAML supports two kinds of collection nodes: sequences and mappings. Mapping nodes are somewhat tricky because their keys are unordered and must be unique.
A YAML node represents a single native data structure. Such nodes have content of one of three kinds: scalar, sequence, or mapping. In addition, each node has a tag which serves to restrict the set of possible values the content can have.
When appropriate, it is convenient to consider sequences and mappings together, as collections. In this view, sequences are treated as mappings with integer keys starting at zero. Having a unified collections view for sequences and mappings is helpful both for theoretical analysis and for creating practical YAML tools and APIs. This strategy is also used by the Javascript programming language.
YAML represents type
information of native data
structures with a simple identifier, called a tag. Global tags are URIs and hence
globally unique across all applications. The
“tag:
” URI scheme is
recommended for all global YAML tags. In contrast, local tags are specific
to a single application.
Local tags start with “!
”, are not URIs
and are not expected to be globally unique. YAML provides a
“TAG
”
directive to make tag notation less verbose; it also
offers easy migration from local to global tags. To ensure this,
local tags are restricted to the URI character set and use URI
character escaping.
YAML does not mandate any special relationship between different
tags that begin with the same substring. Tags ending with URI
fragments (containing “#
”) are no exception; tags
that share the same base URI but differ in their fragment part are
considered to be different, independent tags. By convention,
fragments are used to identify different “variants” of
a tag, while “/
” is used to define nested tag
“namespace” hierarchies. However, this is merely a
convention, and each tag may employ its own rules. For example,
Perl tags may use “::
” to express namespace
hierarchies, Java tags may use “.
”, etc.
YAML tags are used to associate meta information with each node. In particular, each tag must specify the expected node kind (scalar, sequence, or mapping). Scalar tags must also provide a mechanism for converting formatted content to a canonical form for supporting equality testing. Furthermore, a tag may provide additional information such as the set of allowed content values for validation, a mechanism for tag resolution, or any other data that is applicable to all of the tag’s nodes.
Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. In general, it is impossible to ensure uniqueness for presentations, for the following reasons:
YAML allows various ways to format scalar
content. For example, the integer eleven can be
written as “0o13
” (octal) or
“0xB
” (hexadecimal). If both notations are
used as keys in the same
mapping, only a YAML
processor which
recognizes integer formats would
correctly flag the duplicate key as an error.
The semantics of the representation may require that values with different tags be considered equal. For example, the integer one and the float one are considered equal. If both are used as keys in the same mapping, only a YAML processor which recognizes integer and float representations would correctly flag the duplicate key as an error.
YAML therefore requires that each tag must specify a mechanism for testing any of its values for equality with any other value (including values of any different tag). This is often implemented directly by the native data structure instead of the YAML processor. That is, duplicate keys are often flagged as an error during the construction processing stage.
In order to ensure greater compatibility and clarity, YAML
allows the processor
to flag obvious duplicate keys based on the presentation. Specifically,
two scalar keys in the same mapping, with the same tag and the same content, may be flagged as an
error as soon as the parsing stage. Note that this tests
also works for non-specific tags due to
the way that tag resolution is defined.
This allows a human reader to reasonably identify
{ a: 1, a: 2 }
as an error. Such
constructs are silently accepted by many languages, but have
no well defined meaning, and are therefore disallowed in YAML
to avoid surprising behavior.
Two nodes are identical only when they represent the same native data structure. Typically, this corresponds to a single memory address. Identity should not be confused with equality; two equal nodes need not have the same identity. A YAML processor may treat equal scalars as if they were identical. In contrast, the separate identity of two distinct but equal collections must be preserved.
A common programming idiom is creating an empty object to
obtain a value that is only equal to itself (for example, in
order to generate a dynamic “enumerated type”).
The proper way to represent this in YAML
would be !object {}
, where the
!object
tag defines two objects to be equal
only if they are identical. The alternative scalar representation
!object ''
will not work as expected,
as the YAML processor
is not required to preserve the identity of such objects.
To express a YAML representation using a serial API, it is necessary to impose an order on mapping keys and employ alias nodes to indicate a subsequent occurrence of a previously encountered node. The result of this process is a serialization tree, where each node has an ordered set of children. This tree can be traversed for a serial event-based API. Construction of native data structures from the serial interface should not use key order or anchor names for the preservation of application data.
In the representation model, mapping keys do not have an order. To serialize a mapping, it is necessary to impose an ordering on its keys. This order is a serialization detail and should not be used when composing the representation graph (and hence for the preservation of application data). In every case where node order is significant, a sequence must be used. For example, an ordered mapping can be represented as a sequence of mappings, where each mapping is a single key: value pair. YAML provides convenient compact notation for this case.
In the representation graph, a node may appear in more than one collection. When serializing such data, the first occurrence of the node is identified by an anchor. Each subsequent occurrence is serialized as an alias node which refers back to this anchor. Otherwise, anchor names are a serialization detail and are discarded once composing is completed. When composing a representation graph from serialized events, an alias node refers to the most recent node in the serialization having the specified anchor. Therefore, anchors need not be unique within a serialization. In addition, an anchor need not have an alias node referring to it. It is therefore possible to provide an anchor for all nodes in serialization.
A YAML presentation is a stream of Unicode characters making use of styles, scalar content formats, comments, directives and other presentation details to present a YAML serialization in a human readable way. Although a YAML processor may provide these details when parsing, they should not be reflected in the resulting serialization. YAML allows several serialization trees to be contained in the same YAML character stream, as a series of documents separated by markers. Documents appearing in the same stream are independent; that is, a node must not appear in more than one serialization tree or representation graph.
Each node is presented in some style, depending on its kind. The node style is a presentation detail and is not reflected in the serialization tree or representation graph. There are two groups of styles. Block styles use indentation to denote structure; in contrast, flow styles rely on explicit indicators.
YAML provides a rich set of scalar styles. Block scalar styles include the literal style and the folded style. Flow scalar styles include the plain style and two quoted styles, the single-quoted style and the double-quoted style. These styles offer a range of trade-offs between expressive power and readability.
Normally, block sequences and mappings begin on the next line. In some cases, YAML also allows nested block collections to start in-line for a more compact notation. In addition, YAML provides a compact notation for flow mappings with a single key: value pair, nested inside a flow sequence. These allow for a natural “ordered mapping” notation.
YAML allows scalars to be
presented in several formats. For
example, the integer “11
” might also be written as
“0xB
”. Tags must
specify a mechanism for converting the formatted content to a
canonical
form for use in equality testing. Like node style, the format is a presentation
detail and is not reflected in the serialization tree and representation graph.
Comments are a presentation detail and must not have any effect on the serialization tree or representation graph. In particular, comments are not associated with a particular node. The usual purpose of a comment is to communicate between the human maintainers of a file. A typical example is comments in a configuration file. Comments must not appear inside scalars, but may be interleaved with such scalars inside collections.
Each document may be
associated with a set of directives. A directive has a name
and an optional sequence of parameters. Directives are instructions
to the YAML processor, and
like all other presentation details are not reflected
in the YAML serialization
tree or representation
graph. This version of YAML defines two directives,
“YAML
” and “TAG
”.
All other directives are reserved for future versions of
YAML.
The process of loading native data structures from a YAML stream has several potential failure points. The character stream may be ill-formed, aliases may be unidentified, unspecified tags may be unresolvable, tags may be unrecognized, the content may be invalid, and a native type may be unavailable. Each of these failures results with an incomplete loading.
A partial representation need not resolve the tag of each node, and the canonical form of formatted scalar content need not be available. This weaker representation is useful for cases of incomplete knowledge of the types used in the document. In contrast, a complete representation specifies the tag of each node, and provides the canonical form of formatted scalar content, allowing for equality testing. A complete representation is required in order to construct native data structures.
A well-formed character stream must match the BNF productions specified in the following chapters. Successful loading also requires that each alias shall refer to a previous node identified by the anchor. A YAML processor should reject ill-formed streams and unidentified aliases. A YAML processor may recover from syntax errors, possibly by ignoring certain parts of the input, but it must provide a mechanism for reporting such errors.
Typically, most tags are not
explicitly specified in the character stream. During parsing, nodes lacking an explicit tag are given a non-specific tag: “!
” for non-plain scalars, and
“?
”
for all other nodes. Composing a complete
representation requires each such non-specific tag to be
resolved to a
specific tag,
be it a global
tag or a local
tag.
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node, and (3) the content (and hence the kind) of the node. When a node has more than one occurrence (using aliases), tag resolution must depend only on the path to the first (anchored) occurrence of the node.
Note that resolution must not consider presentation details such as comments, indentation and node style. Also, resolution must not consider the content of any other node, except for the content of the key nodes directly along the path leading from the root to the resolved node. Finally, resolution must not consider the content of a sibling node in a collection, or the content of the value node associated with a key node being resolved.
These rules ensure that tag resolution can be performed as soon as a node is first encountered in the stream, typically before its content is parsed. Also, tag resolution only requires referring to a relatively small number of previously parsed nodes. Thus, in most cases, tag resolution in one-pass processors is both possible and practical.
YAML processors should resolve
nodes having the “!
”
non-specific tag as “tag:yaml.org,2002:seq
”,
“tag:yaml.org,2002:map
” or
“tag:yaml.org,2002:str
” depending on their kind. This tag resolution
convention allows the author of a YAML character stream to effectively
“disable” the tag resolution process. By explicitly
specifying a “!
” non-specific tag property, the node would then be resolved to a
“vanilla” sequence, mapping, or string, according to its
kind.
Application specific tag
resolution rules should be restricted to resolving the
“?
” non-specific tag, most commonly to resolving
plain
scalars. These may be matched against a set of regular
expressions to provide automatic resolution of integers, floats,
timestamps, and similar types. An application may also match the
content of mapping nodes against sets of expected
keys to automatically resolve
points, complex numbers, and similar types. Resolved sequence node types such as the
“ordered mapping” are also possible.
That said, tag resolution is specific to the application. YAML processors should therefore provide a mechanism allowing the application to override and expand these default tag resolution rules.
If a document contains unresolved tags, the YAML processor is unable to compose a complete representation graph. In such a case, the YAML processor may compose a partial representation, based on each node’s kind and allowing for non-specific tags.
To be valid, a node must have a tag which is recognized by the YAML processor and its content must satisfy the constraints imposed by this tag. If a document contains a scalar node with an unrecognized tag or invalid content, only a partial representation may be composed. In contrast, a YAML processor can always compose a complete representation for an unrecognized or an invalid collection, since collection equality does not depend upon knowledge of the collection’s data type. However, such a complete representation cannot be used to construct a native data structure.
In a given processing environment, there need not be an available native type corresponding to a given tag. If a node’s tag is unavailable, a YAML processor will not be able to construct a native data structure for it. In this case, a complete representation may still be composed, and an application may wish to use this representation directly.
The following chapters formally define the syntax of YAML character streams, using parameterized BNF productions. Each BNF production is both named and numbered for easy reference. Whenever possible, basic structures are specified before the more complex structures using them in a “bottom up” fashion.
The order of alternatives inside a production is significant. Subsequent
alternatives are only considered when previous ones fails. See for
example the b-break
production.
In addition, production matching is expected to be greedy. Optional
(?
), zero-or-more (*
) and
one-or-more (+
) patterns are always expected to
match as much of the input as possible.
The productions are accompanied by examples, which are given side-by-side next to equivalent YAML text in an explanatory format. This format uses only flow collections, double-quoted scalars, and explicit tags for each node.
A reference implementation using the productions is available as the YamlReference Haskell package. This reference implementation is also available as an interactive web application at http://dev.yaml.org/ypaste.
YAML’s syntax is designed for maximal human readability. This requires parsing to depend on the surrounding text. For notational compactness, this dependency is expressed using parameterized BNF productions.
This context sensitivity is the cause of most of the complexity of the YAML syntax definition. It is further complicated by struggling with the human tendency to look ahead when interpreting text. These complications are of course the source of most of YAML’s power to present data in a very human readable way.
Productions use any of the following parameters:
n
or m
c
This parameter allows productions to tweak their behavior according to their surrounding. YAML supports two groups of contexts, distinguishing between block styles and flow styles.
In block
styles, indentation is used to
delineate structure. To capture human perception of indentation the
rules require special treatment of the “-
” character, used in
block sequences. Hence in some
cases productions need to behave differently inside block
sequences (block-in context) and outside them
(block-out
context).
In flow styles, explicit indicators are used to delineate structure. These styles can be viewed as the natural extension of JSON to cover tagged, single-quoted and plain scalars. Since the latter have no delineating indicators, they are subject to some restrictions to avoid ambiguities. These restrictions depend on where they appear: as implicit keys directly inside a block mapping (block-key); as implicit keys inside a flow mapping (flow-key); as values inside a flow collection (flow-in); or as values outside one (flow-out).
t
To make it easier to follow production combinations, production names use a Hungarian-style naming convention. Each production is given a prefix based on the type of characters it begins and ends with.
e-
c-
b-
nb-
s-
ns-
l-
X
-
Y
-
X
-
character and ending
with a Y
-
character,
where X
-
and
Y
-
are any of the above
prefixes.
X
+
,
X
-
Y
+
n
parameter.
To ensure readability, YAML streams
use only the printable
subset of the Unicode character set. The allowed character range
explicitly excludes the C0 control block
#x0-#x1F
(except for TAB
#x9
, LF #xA
, and CR
#xD
which are allowed), DEL
#x7F
, the C1 control block
#x80-#x9F
(except for NEL
#x85
which is allowed), the surrogate block
#xD800-#xDFFF
, #xFFFE
,
and #xFFFF
.
On input, a YAML processor must accept all Unicode characters except those explicitly excluded above.
On output, a YAML processor must only produce acceptable characters. Any excluded characters must be presented using escape sequences. In addition, any allowed characters known to be non-printable should also be escaped. This isn’t mandatory since a full implementation would require extensive character property tables.
|
To ensure JSON compatibility, YAML processors must allow all non-control characters inside quoted scalars. To ensure readability, non-printable characters should be escaped on output, even inside such scalars. Note that JSON quoted scalars cannot span multiple lines or contain tabs, but YAML quoted scalars can.
|
All characters mentioned in this specification are Unicode code points.
Each such code point is written as one or more bytes depending on the
character encoding
used. Note that in UTF-16, characters above
#xFFFF
are written as four bytes, using a
surrogate pair.
The character encoding is a presentation detail and must not be used to convey content information.
On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported.
If a character stream begins with a
byte order mark, the
character encoding will be taken to be as indicated by the byte
order mark. Otherwise, the stream
must begin with an ASCII character. This allows the encoding to be
deduced by the pattern of null (#x00
)
characters.
To make it easier to concatenate streams, byte order marks may appear at the start of any document. However all documents in the same stream must use the same character encoding.
To allow for JSON compatibility, byte order marks are also allowed inside quoted scalars. For readability, such content byte order marks should be escaped on output.
The encoding can therefore be deduced by matching the first few bytes of the stream with the following table rows (in order):
Byte0 | Byte1 | Byte2 | Byte3 | Encoding | |
Explicit BOM | #x00 |
#x00 |
#xFE |
#xFF |
UTF-32BE |
ASCII first character | #x00 |
#x00 |
#x00 |
any | UTF-32BE |
Explicit BOM | #xFF |
#xFE |
#x00 |
#x00 |
UTF-32LE |
ASCII first character | any | #x00 |
#x00 |
#x00 |
UTF-32LE |
Explicit BOM | #xFE |
#xFF |
UTF-16BE | ||
ASCII first character | #x00 |
any | UTF-16BE | ||
Explicit BOM | #xFF |
#xFE |
UTF-16LE | ||
ASCII first character | any | #x00 |
UTF-16LE | ||
Explicit BOM | #xEF |
#xBB |
#xBF |
UTF-8 | |
Default | UTF-8 |
The recommended output encoding is UTF-8. If another encoding is used, it is recommended that an explicit byte order mark be used, even if the first stream character is ASCII.
For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.
|
In the examples, byte order mark characters are displayed as
“⇔
”.
Example 5.1. Byte Order Mark
Legend:
|
# This stream contains no
|
Example 5.2. Invalid Byte Order Mark
- Invalid use of BOM
|
ERROR:
|
Indicators are characters that have special semantics.
|
A “- ” (#x2D ,
hyphen) denotes a block sequence entry.
|
|
A “? ”
(#x3F , question mark) denotes a mapping key.
|
|
A “: ”
(#x3A , colon) denotes a mapping value.
|
Example 5.3. Block Structure Indicators
sequence:
Legend: |
%YAML 1.2
|
|
A “, ”
(#x2C , comma) ends a flow collection
entry.
|
|
A “[ ” (#x5B ,
left bracket) starts a flow sequence.
|
|
A “] ”
(#x5D , right bracket) ends a flow
sequence.
|
|
A “{ ” (#x7B ,
left brace) starts a flow mapping.
|
|
A “} ”
(#x7D , right brace) ends a flow
mapping.
|
Example 5.4. Flow Collection Indicators
sequence:
Legend: |
%YAML 1.2
|
|
An “# ”
(#x23 , octothorpe, hash, sharp, pound, number
sign) denotes a comment.
|
Example 5.5. Comment Indicator
Legend:
|
# This stream contains no
|
|
An “& ”
(#x26 , ampersand) denotes a node’s anchor property.
|
|
An “* ”
(#x2A , asterisk) denotes an alias node.
|
|
The “! ”
(#x21 , exclamation) is heavily overloaded for
specifying node tags. It is used to
denote tag
handles used in tag directives and tag properties; to denote local tags; and as the
non-specific
tag for non-plain scalars.
|
|
A “| ”
(7C , vertical bar) denotes a literal block
scalar.
|
|
A “> ” (#x3E ,
greater than) denotes a folded block scalar.
|
|
An “' ” (#x27 ,
apostrophe, single quote) surrounds a single-quoted flow
scalar.
|
|
A “" ” (#x22 ,
double quote) surrounds a double-quoted flow scalar.
|
Example 5.8. Quoted Scalar Indicators
single:
Legend: |
%YAML 1.2
|
|
A “% ”
(#x25 , percent) denotes a directive line.
|
|
The “@ ” (#x40 ,
at) and “` ” (#x60 ,
grave accent) are reserved for future use.
|
Example 5.10. Invalid use of Reserved Indicators
commercial-at:
|
ERROR:
|
Any indicator character:
|
The “[
”, “]
”, “{
”, “}
” and “,
” indicators denote structure in
flow
collections. They are therefore forbidden in some cases, to
avoid ambiguity in several constructs. This is handled on a
case-by-case basis by the relevant productions.
|
YAML recognizes the following ASCII line break characters.
|
All other characters, including the form feed
(#x0C
), are considered to be non-break
characters. Note that these include the non-ASCII line breaks: next line
(#x85
), line separator
(#x2028
) and paragraph separator
(#x2029
).
YAML version 1.1 did support the above non-ASCII line break characters; however, JSON does not. Hence, to ensure JSON compatibility, YAML treats them as non-break characters as of version 1.2. In theory this would cause incompatibility with version 1.1; in practice these characters were rarely (if ever) used. YAML 1.2 processors parsing a version 1.1 document should therefore treat these line breaks as non-break characters, with an appropriate warning.
|
Line breaks are interpreted differently by different systems, and have several widely used formats.
|
Line breaks inside scalar content must be normalized by the YAML processor. Each such line break must be parsed into a single line feed character. The original line break format is a presentation detail and must not be used to convey content information.
|
Outside scalar content, YAML allows any line break to be used to terminate lines.
|
On output, a YAML processor is free to emit line breaks using whatever convention is most appropriate.
In the examples, line breaks are sometimes displayed using the
“↓
” glyph for clarity.
Example 5.11. Line Break Characters
|
Legend:
|
%YAML 1.2
|
YAML recognizes two white space characters: space and tab.
|
The rest of the (printable) non-break characters are considered to be non-space characters.
|
In the examples, tab characters are displayed as the glyph
“→
”. Space characters are sometimes displayed as
the glyph “·
” for clarity.
The YAML syntax productions make use of the following additional character classes:
|
|
|
|
URI characters for tags, as
specified in RFC2396, with the
addition of the “[
” and “]
” for
presenting IPv6 addresses as proposed in RFC2732.
By convention, any URI characters other than the allowed printable
ASCII characters are first encoded in UTF-8, and then each byte
is escaped using the “%
” character. The YAML processor must not expand such
escaped characters. Tag characters
must be preserved and compared exactly as presented in the YAML stream, without any processing.
|
!
” character is used
to indicate the end of a named tag handle; hence its use in
tag shorthands
is restricted. In addition, such shorthands must not contain the
“[
”, “]
”, “{
”, “}
” and “,
” characters. These
characters would cause ambiguity with flow collection
structures.
|
All non-printable
characters must be escaped. YAML escape sequences use the
“\
” notation common to most modern
computer languages. Each escape sequence must be parsed into the appropriate Unicode
character. The original escape sequence is a presentation detail
and must not be used to convey content information.
Note that escape sequences are only interpreted in double-quoted
scalars. In all other scalar styles, the “\
”
character has no special meaning and non-printable characters are not available.
|
YAML escape sequences are a superset of C’s escape sequences:
|
Escaped ASCII null (#x0 ) character.
|
|
Escaped ASCII bell (#x7 ) character.
|
|
Escaped ASCII backspace (#x8 ) character.
|
|
Escaped ASCII horizontal tab (#x9 ) character.
This is useful at the start or the end of a line to force a leading
or trailing tab to become part of the content.
|
|
Escaped ASCII line feed (#xA ) character.
|
|
Escaped ASCII vertical tab (#xB ) character.
|
|
Escaped ASCII form feed (#xC ) character.
|
|
Escaped ASCII carriage return (#xD ) character.
|
|
Escaped ASCII escape (#x1B ) character.
|
|
Escaped ASCII space (#x20 ) character. This is
useful at the start or the end of a line to force a leading or
trailing space to become part of the content.
|
|
Escaped ASCII double quote (#x22 ).
|
|
Escaped ASCII slash (#x2F ),
for JSON
compatibility.
|
|
Escaped ASCII back slash (#x5C ).
|
|
Escaped Unicode next line (#x85 ) character.
|
|
Escaped Unicode non-breaking space (#xA0 )
character.
|
|
Escaped Unicode line separator (#x2028 )
character.
|
|
Escaped Unicode paragraph separator (#x2029 )
character.
|
|
Escaped 8-bit Unicode character. |
|
Escaped 16-bit Unicode character. |
|
Escaped 32-bit Unicode character. |
Any escaped character:
|
Example 5.13. Escaped Characters
"Fun with
Legend:
|
%YAML 1.2
|
Example 5.14. Invalid Escaped Characters
Bad escapes:
|
ERROR:
|
In YAML block styles, structure is determined by indentation. In general, indentation is defined as a zero or more space characters at the start of a line.
To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.
The amount of indentation is a presentation detail and must not be used to convey content information.
|
A block style
construct is terminated when encountering a line which is less indented
than the construct. The productions use the notation
“s-indent(<n)
” and “s-indent(≤n)
”
to express this.
|
Each node must be indented further than its parent node. All sibling nodes must use the exact same indentation level. However the content of each sibling node may be further indented independently.
Example 6.1. Indentation Spaces
··# Leading comment line spaces are
Legend: |
%YAML 1.2
|
The “-
”, “?
” and “:
” characters used to denote block
collection entries are perceived by people to be part of the
indentation. This is handled on a case-by-case basis by the relevant
productions.
Example 6.2. Indentation Indicators
Legend: |
%YAML 1.2
|
Outside indentation and scalar content, YAML uses white space characters for separation between tokens within a line. Note that such white space may safely include tab characters.
Separation spaces are a presentation detail and must not be used to convey content information.
|
Example 6.3. Separation Spaces
-
Legend:
|
%YAML 1.2
|
Inside scalar content, each line begins with a non-content line prefix. This prefix always includes the indentation. For flow scalar styles it additionally includes all leading white space, which may contain tab characters.
Line prefixes are a presentation detail and must not be used to convey content information.
|
Example 6.4. Line Prefixes
plain: text
Legend: |
%YAML 1.2
|
An empty line line consists of the non-content prefix followed by a line break.
|
The semantics of empty lines depend on the scalar style they appear in. This is handled on a case-by-case basis by the relevant productions.
Example 6.5. Empty Lines
Folding:
Legend:
|
%YAML 1.2
|
Line folding allows long lines to be broken for readability, while retaining the semantics of the original long line. If a line break is followed by an empty line, it is trimmed; the first line break is discarded and the rest are retained as content.
|
Otherwise (the following line is not empty), the line
break is converted to a single space (#x20
).
|
A folded non-empty line may end with either of the above line breaks.
|
Example 6.6. Line Folding
>-
|
%YAML 1.2
Legend: |
The above rules are common to both the folded block style and the scalar flow styles. Folding does distinguish between these cases in the following way:
In the folded block style, the final line break and trailing empty lines are subject to chomping, and are never folded. In addition, folding does not apply to line breaks surrounding text lines that contain leading white space. Note that such a more-indented line may consist only of such leading white space.
The combined effect of the block line folding rules is that each “paragraph” is interpreted as a line, empty lines are interpreted as a line feed, and the formatting of more-indented lines is preserved.
Example 6.7. Block Folding
>
|
%YAML 1.2
Legend: |
Folding in flow styles provides more relaxed semantics. Flow styles typically depend on explicit indicators rather than indentation to convey structure. Hence spaces preceding or following the text in a line are a presentation detail and must not be used to convey content information. Once all such spaces have been discarded, all line breaks are folded, without exception.
The combined effect of the flow line folding rules is that each “paragraph” is interpreted as a line, empty lines are interpreted as line feeds, and text can be freely more-indented without affecting the content information.
|
Example 6.8. Flow Folding
"
|
%YAML 1.2
Legend: |
An explicit comment is marked by a
“#
” indicator.
Comments are a presentation detail and must not be used
to convey content information.
Comments must be separated from other tokens by white space characters. To ensure JSON compatibility, YAML processors must allow for the omission of the final comment line break of the input stream. However, as this confuses many tools, YAML processors should terminate the stream with an explicit line break on output.
|
Example 6.9. Separated Comment
key:····
Legend: |
%YAML 1.2
|
Outside scalar content, comments may appear on a line of their own, independent of the indentation level. Note that outside scalar content, a line containing only white space characters is taken to be a comment line.
|
Example 6.10. Comment Lines
|
# This stream contains no
Legend: |
In most cases, when a line may end with a comment, YAML allows it to be followed by additional comment lines. The only exception is a comment ending a block scalar header.
|
Example 6.11. Multi-Line Comments
key:
|
%YAML 1.2
|
Legend:s-b-comment
l-comment
s-l-comments
Implicit keys are restricted to a single line. In all other cases, YAML allows tokens to be separated by multi-line (possibly empty) comments.
Note that structures following multi-line comment separation must be properly indented, even though there is no such restriction on the separation comment lines themselves.
|
Example 6.12. Separation Spaces
{
Legend: |
%YAML 1.2
|
Directives are instructions to
the YAML processor. This
specification defines two directives, “YAML
” and “TAG
”, and
reserves
all other directives for future use. There is no way to define private
directives. This is intentional.
Directives are a presentation detail and must not be used to convey content information.
|
Each directive is specified on a separate non-indented line starting with the
“%
” indicator,
followed by the directive name and a list of parameters. The semantics
of these parameters depends on the specific directive. A YAML processor should ignore unknown
directives with an appropriate warning.
|
Example 6.13. Reserved Directives
%
|
%YAML 1.2
|
Legend:ns-reserved-directive
ns-directive-name
ns-directive-parameter
The “YAML
” directive specifies
the version of YAML the document conforms to. This specification
defines version “1.2
”, including recommendations for
YAML 1.1 processing.
A version 1.2 YAML processor
must accept documents with an
explicit “%YAML 1.2
” directive, as well as documents lacking a
“YAML
” directive. Such documents are assumed to conform to the
1.2 version specification. Documents with a “YAML
”
directive specifying a higher minor version (e.g.
“%YAML 1.3
”) should be processed with an
appropriate warning. Documents
with a “YAML
” directive specifying a higher major
version (e.g. “%YAML 2.0
”) should be rejected
with an appropriate error message.
A version 1.2 YAML processor
must also accept documents with
an explicit “%YAML 1.1
” directive. Note that version
1.2 is mostly a superset of version 1.1, defined for the purpose of
ensuring JSON
compatibility. Hence a version 1.2 processor should process version 1.1
documents as if they were
version 1.2, giving a warning on points of incompatibility (handling
of non-ASCII line
breaks, as described above).
|
Example 6.14. “YAML
” directive
%
|
%YAML 1.2
Legend: |
It is an error to specify more than one “YAML
”
directive for the same document, even if both occurrences give the
same version number.
Example 6.15. Invalid Repeated YAML directive
%YAML 1.2
|
ERROR:
|
The “TAG
”
directive establishes a tag shorthand notation for specifying
node tags. Each “TAG
”
directive associates a handle with a prefix. This allows for compact and
readable tag notation.
|
Example 6.16. “TAG
” directive
%
|
%YAML 1.2
|
Legend:ns-tag-directive
c-tag-handle
ns-tag-prefix
It is an error to specify more than one “TAG
”
directive for the same handle in the same document, even if
both occurrences give the same prefix.
Example 6.17. Invalid Repeated TAG directive
%TAG ! !foo
|
ERROR:
|
The tag handle exactly matches the prefix of the affected tag shorthand. There are three tag handle variants:
|
The primary tag handle is a single
“!
” character. This allows
using the most compact possible notation for a single
“primary” name space. By default, the prefix
associated with this handle is “!
”. Thus, by default, shorthands
using this handle are interpreted as local tags.
It is possible to override the default behavior by providing
an explicit “TAG
” directive, associating a
different prefix for this handle. This provides smooth
migration from using local tags to using global tags, by
the simple addition of a single “TAG
”
directive.
|
Example 6.18. Primary Tag Handle
# Private
|
%YAML 1.2
Legend:
|
The secondary tag handle is
written as “!!
”. This
allows using a compact notation for a single
“secondary” name space. By default, the prefix
associated with this handle is
“tag:yaml.org,2002:
”. This prefix is used by
the YAML tag
repository.
It is possible to override this default behavior by providing
an explicit “TAG
” directive associating a
different prefix for this handle.
|
Example 6.19. Secondary Tag Handle
%TAG
Legend:
|
%YAML 1.2
|
A named tag handle surrounds a
non-empty name with “!
” characters. A handle
name must not be used in a tag shorthand unless an
explicit “TAG
” directive has associated some
prefix with it.
The name of the handle is a presentation detail and must not be used to convey content information. In particular, the YAML processor need not preserve the handle name once parsing is completed.
|
Example 6.20. Tag Handles
%TAG
Legend:
|
%YAML 1.2
|
There are two tag prefix variants:
|
!
” character,
shorthands using the handle are expanded
to a local
tag. Note that such a tag is intentionally not a valid URI,
and its semantics are specific to the application. In particular, two
documents in the same
stream may assign different
semantics to the same local tag.
|
Example 6.21. Local Tag Prefix
%TAG !m!
Legend:
|
%YAML 1.2
|
!
”, it must be a valid URI
prefix, and should contain at least the scheme and the
authority. Shorthands using the associated
handle are
expanded to globally unique URI tags, and their semantics is
consistent across applications. In particular,
every documents in every
stream must assign the same
semantics to the same global tag.
|
Example 6.22. Global Tag Prefix
%TAG !e!
Legend:
|
%YAML 1.2
|
Each node may have two optional properties, anchor and tag, in addition to its content. Node properties may be specified in any order before the node’s content. Either or both may be omitted.
|
Example 6.23. Node Properties
Legend: |
%YAML 1.2
|
The tag
property identifies the type of the native data structure
presented by the node. A tag is denoted by the “!
” indicator.
|
<
” and “>
”
characters. In this case, the YAML processor must deliver the verbatim
tag as-is to the application. In particular,
verbatim tags are not subject to tag resolution. A verbatim tag
must either begin with a “!
” (a local tag) or be a
valid URI (a global
tag).
|
Example 6.24. Verbatim Tags
Legend:
|
%YAML 1.2
|
Example 6.25. Invalid Verbatim Tags
- !<
|
ERROR:
|
A tag
shorthand consists of a valid tag handle followed by a non-empty
suffix. The tag
handle must be associated with a prefix, either by
default or by using a “TAG
” directive. The
resulting parsed tag is the concatenation of the
prefix and
the suffix, and must either begin with “!
”
(a local
tag) or be a valid URI (a global tag).
The choice of tag handle is a presentation detail and must not be used to convey content information. In particular, the tag handle may be discarded once parsing is completed.
The suffix must not contain any “!
” character. This would
cause the tag shorthand to be interpreted as having a named tag
handle. In addition, the suffix must not contain the
“[
”, “]
”, “{
”,
“}
” and “,
” characters. These
characters would cause ambiguity with flow
collection structures. If the suffix needs to specify
any of the above restricted characters, they must be escaped using the
“%
” character. This behavior is
consistent with the URI character escaping rules (specifically,
section 2.3 of RFC2396).
|
Example 6.26. Tag Shorthands
%TAG !e! tag:example.com,2000:app/
Legend:
|
%YAML 1.2
|
Example 6.27. Invalid Tag Shorthands
%TAG !e! tag:example,2000:app/
|
ERROR:
|
If a node has no tag
property, it is assigned a non-specific tag that needs
to be resolved to a specific one. This
non-specific
tag is “!
” for
non-plain scalars and “?
” for
all other nodes. This is the
only case where the node
style has any effect on the content information.
It is possible for the tag property to be explicitly set to the
“!
” non-specific tag. By convention, this
“disables” tag resolution, forcing the
node to be interpreted as
“tag:yaml.org,2002:seq
”,
“tag:yaml.org,2002:map
”, or
“tag:yaml.org,2002:str
”, according to its
kind.
There is no way to explicitly specify the “?
” non-specific
tag. This is intentional.
|
Example 6.28. Non-Specific Tags
# Assuming conventional resolution:
Legend:
|
%YAML 1.2
|
An anchor is denoted by the “&
” indicator. It marks a
node for future reference. An
alias node can then be used to
indicate additional inclusions of the anchored node. An anchored node need not be referenced by any alias nodes; in particular, it is valid for
all nodes to be anchored.
|
Note that as a serialization detail, the anchor name is preserved in the serialization tree. However, it is not reflected in the representation graph and must not be used to convey content information. In particular, the YAML processor need not preserve the anchor name once the representation is composed.
Anchor names must not contain the “[
”, “]
”, “{
”, “}
” and “,
” characters. These
characters would cause ambiguity with flow collection
structures.
|
Example 6.29. Node Anchors
First occurrence:
Legend: |
%YAML 1.2
|
YAML’s flow styles can be thought of as the natural extension of JSON to cover folding long content lines for readability, tagging nodes to control construction of native data structures, and using anchors and aliases to reuse constructed object instances.
Subsequent occurrences of a previously serialized node are presented as alias nodes. The first occurrence of the node must be marked by an anchor to allow subsequent occurrences to be presented as alias nodes.
An alias node is denoted by the “*
” indicator. The alias refers to the
most recent preceding node having the
same anchor. It is an error for an
alias node to use an anchor that
does not previously occur in the document. It is not an error to specify an
anchor that is not used by any
alias node.
Note that an alias node must not specify any properties or content, as these were already specified at the first occurrence of the node.
|
Example 7.1. Alias Nodes
First occurrence: &
Legend: |
%YAML 1.2
|
YAML allows the node content to be
omitted in many cases. Nodes with
empty content are interpreted as
if they were plain scalars with an empty value. Such
nodes are commonly resolved to a
“null
” value.
|
In the examples, empty scalars are
sometimes displayed as the glyph “°
” for clarity.
Note that this glyph corresponds to a position in the characters
stream rather than to an actual
character.
Example 7.2. Empty Content
{
Legend:
|
%YAML 1.2
|
Both the node’s properties and node content are optional. This allows for a completely empty node. Completely empty nodes are only valid when following some explicit indication for their existence.
|
Example 7.3. Completely Empty Flow Nodes
{
Legend:
|
%YAML 1.2
|
YAML provides three flow scalar styles: double-quoted, single-quoted and plain (unquoted). Each provides a different trade-off between readability and expressive power.
The scalar style is a presentation detail and must not be used to convey content information, with the exception that plain scalars are distinguished for the purpose of tag resolution.
The double-quoted style is specified
by surrounding “"
” indicators. This is the only
style capable of expressing
arbitrary strings, by using “\
” escape
sequences. This comes at the cost of having to escape the
“\
” and “"
”
characters.
|
Double-quoted scalars are restricted to a single line when contained inside an implicit key.
|
Example 7.4. Double Quoted Implicit Keys
Legend: |
%YAML 1.2
|
In a multi-line double-quoted scalar, line breaks are subject to flow line folding, which discards any trailing white space characters. It is also possible to escape the line break character. In this case, the line break is excluded from the content, and the trailing white space characters are preserved. Combined with the ability to escape white space characters, this allows double-quoted lines to be broken at arbitrary positions.
|
Example 7.5. Double Quoted Line Breaks
"folded
|
%YAML 1.2
|
Legend:s-flow-folded(n)
s-double-escaped(n)
All leading and trailing white space characters are excluded from the content. Each continuation line must therefore contain at least one non-space character. Empty lines, if any, are consumed as part of the line folding.
|
Example 7.6. Double Quoted Lines
"
|
%YAML 1.2
|
Legend:nb-ns-double-in-line
s-double-next-line(n)
The single-quoted style is specified
by surrounding “'
” indicators. Therefore, within a
single-quoted scalar, such characters need to be repeated. This is
the only form of escaping performed in single-quoted
scalars. In particular, the “\
” and “"
”
characters may be freely used. This restricts single-quoted scalars
to printable
characters. In addition, it is only possible to break a long
single-quoted line where a space
character is surrounded by non-spaces.
|
Example 7.7. Single Quoted Characters
'here
Legend:
|
%YAML 1.2
|
Single-quoted scalars are restricted to a single line when contained inside a implicit key.
|
Example 7.8. Single Quoted Implicit Keys
Legend: |
%YAML 1.2
|
All leading and trailing white space characters are excluded from the content. Each continuation line must therefore contain at least one non-space character. Empty lines, if any, are consumed as part of the line folding.
|
Example 7.9. Single Quoted Lines
'
|
%YAML 1.2
|
Legend:nb-ns-single-in-line(n)
s-single-next-line(n)
The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style. In addition to a restricted character set, a plain scalar must not be empty, or contain leading or trailing white space characters. It is only possible to break a long plain line where a space character is surrounded by non-spaces.
Plain scalars must not begin with most indicators, as this would cause
ambiguity with other YAML constructs. However, the “:
”, “?
” and “-
”
indicators may be used as the
first character if followed by a non-space “safe” character, as
this causes no ambiguity.
|
Plain scalars must never contain the “:
” and “ #
” character combinations.
Such combinations would cause ambiguity with mapping key: value pairs and comments. In addition, inside flow
collections, or when used as implicit keys, plain scalars must not
contain the “[
”, “]
”, “{
”, “}
” and “,
” characters. These
characters would cause ambiguity with flow collection
structures.
|
Example 7.10. Plain Characters
# Outside flow collection:
|
%YAML 1.2
|
Legend:ns-plain-first(c)
Not ns-plain-first(c)ns-plain-char(c)
Not ns-plain-char(c)
Plain scalars are further restricted to a single line when contained inside an implicit key.
|
Example 7.11. Plain Implicit Keys
Legend:
|
%YAML 1.2
|
All leading and trailing white space characters are excluded from the content. Each continuation line must therefore contain at least one non-space character. Empty lines, if any, are consumed as part of the line folding.
|
Example 7.12. Plain Lines
|
%YAML 1.2
|
Legend:nb-ns-plain-in-line(c)
s-ns-plain-next-line(n,c)
A flow
collection may be nested within a block collection
(flow-out
context), nested within another flow collection (flow-in
context), or be a part of an implicit key (flow-key
context
or block-key
context). Flow collection entries are terminated by the
“,
”
indicator. The final “,
” may be omitted. This
does not cause ambiguity because flow collection entries can never be
completely
empty.
|
Flow
sequence content is denoted by surrounding “[
” and
“]
”
characters.
|
Sequence entries are separated by a “,
” character.
|
Example 7.13. Flow Sequence
-
Legend: |
%YAML 1.2
|
Any flow node may be used as a flow sequence entry. In addition, YAML provides a compact notation for the case where a flow sequence entry is a mapping with a single key: value pair.
|
Example 7.14. Flow Sequence Entries
[
Legend: |
%YAML 1.2
|
Flow
mappings are denoted by surrounding “{
” and “}
” characters.
|
Mapping entries are separated by a “,
” character.
|
Example 7.15. Flow Mappings
-
Legend: |
%YAML 1.2
|
If the optional “?
”
mapping key indicator is specified, the rest of the entry
may be completely empty.
|
Example 7.16. Flow Mapping Entries
{
Legend: |
%YAML 1.2
|
Normally, YAML insists the “:
” mapping value indicator be
separated
from the value by white space. A benefit of
this restriction is that the “:
” character can be used
inside plain scalars, as long as it is not
followed by white
space. This allows for unquoted URLs and timestamps. It is
also a potential source for confusion as “a:1
” is a
plain
scalar and not a key: value pair.
Note that the value may be
completely
empty since its existence is indicated by the
“:
”.
|
Example 7.17. Flow Mapping Separate Values
{
Legend: |
%YAML 1.2
|
To ensure JSON
compatibility, if a key
inside a flow mapping is JSON-like, YAML allows the following
value to be specified adjacent to
the “:
”. This causes no ambiguity, as all JSON-like keys are surrounded by indicators. However, as this greatly
reduces readability, YAML processors should separate the
value from the “:
”
on output, even in this case.
|
Example 7.18. Flow Mapping Adjacent Values
{
Legend: |
%YAML 1.2
|
A more compact notation is usable inside flow sequences, if the
mapping contains a single
key: value pair. This notation does not require the
surrounding “{
” and “}
” characters.
Note that it is not possible to specify any node properties for the mapping in this case.
Example 7.19. Single Pair Flow Mappings
[
Legend:
|
%YAML 1.2
|
If the “?
” indicator is explicitly specified, parsing is unambiguous, and the syntax is
identical to the general case.
|
Example 7.20. Single Pair Explicit Entry
[
Legend:
|
%YAML 1.2
|
If the “?
” indicator is omitted, parsing needs to see past the implicit key to
recognize it as such. To limit the amount of lookahead required, the
“:
” indicator must appear at most 1024 Unicode
characters beyond the start of the key. In addition, the key is restricted to a single line.
Note that YAML allows arbitrary nodes to be used as keys. In particular, a key may be a sequence or a mapping. Thus, without the above restrictions, practical one-pass parsing would have been impossible to implement.
|
Example 7.21. Single Pair Implicit Entries
- [
Legend: |
%YAML 1.2
|
Example 7.22. Invalid Implicit Keys
[
|
ERROR:
|
JSON-like flow styles all have explicit start and end indicators. The only flow style that does not have this property is the plain scalar. Note that none of the “JSON-like” styles is actually acceptable by JSON. Even the double-quoted style is a superset of the JSON string format.
|
Example 7.23. Flow Content
-
Legend: |
%YAML 1.2
|
A complete flow node also has optional node properties, except for alias nodes which refer to the anchored node properties.
|
Example 7.24. Flow Nodes
-
Legend: |
%YAML 1.2
|
YAML’s block styles employ indentation rather than indicators to denote structure. This results in a more human readable (though less compact) notation.
YAML provides two block scalar styles, literal and folded. Each provides a different trade-off between readability and expressive power.
Block scalars are controlled by a few indicators given in a header preceding the content itself. This header is followed by a non-content line break with an optional comment. This is the only case where a comment must not be followed by additional comment lines.
|
Example 8.1. Block Scalar Header
- |
|
%YAML 1.2
Legend:
|
Typically, the indentation level of a block scalar is detected from its first non-empty line. It is an error for any of the leading empty lines to contain more spaces than the first non-empty line.
Detection fails when the first non-empty line contains leading content space characters. Content may safely start with a
tab or a “#
” character.
When detection would fail, YAML requires that the indentation level for the content be given using an explicit indentation indicator. This level is specified as the integer number of the additional indentation spaces used for the content, relative to its parent node.
It is always valid to specify an indentation indicator for a block scalar node, though a YAML processor should only emit an explicit indentation indicator for cases where detection will fail.
|
Example 8.2. Block Indentation Indicator
- |
|
%YAML 1.2
Legend: |
Example 8.3. Invalid Block Scalar Indentation Indicators
- |
|
ERROR:
|
Chomping controls how final line breaks and trailing empty lines are interpreted. YAML provides three chomping methods:
-
”
chomping indicator. In this case, the final
line break and any
trailing empty lines
are excluded from the scalar’s
content.
+
” chomping indicator. In
this case, the final line
break and any trailing empty lines are considered to be part of the
scalar’s content. These
additional lines are not subject to folding.
The chomping method used is a presentation detail and must not be used to convey content information.
|
The interpretation of the final line break of a block scalar is controlled by the chomping indicator specified in the block scalar header.
|
Example 8.4. Chomping Final Line Break
strip: |-
Legend: |
%YAML 1.2
|
The interpretation of the trailing empty lines following a block scalar is also controlled by the chomping indicator specified in the block scalar header.
|
Explicit comment lines may follow the trailing empty lines. To prevent ambiguity, the first such comment line must be less indented than the block scalar content. Additional comment lines, if any, are not so restricted. This is the only case where the indentation of comment lines is constrained.
|
Example 8.5. Chomping Trailing Lines
# Strip
|
%YAML 1.2
Legend: |
If a block scalar consists only of empty lines, then these lines are considered as trailing lines and hence are affected by chomping.
Example 8.6. Empty Scalar Chomping
strip: >-
Legend: |
%YAML 1.2
|
The literal style is denoted by the “|
” indicator. It
is the simplest, most restricted, and most readable scalar style.
|
Example 8.7. Literal Scalar
Legend:
|
%YAML 1.2
|
Inside literal scalars, all (indented) characters are considered to be content, including white space characters. Note that all line break characters are normalized. In addition, empty lines are not folded, though final line breaks and trailing empty lines are chomped.
There is no way to escape characters inside literal scalars. This restricts them to printable characters. In addition, there is no way to break a long literal line.
|
Example 8.8. Literal Content
|
|
%YAML 1.2
Legend: |
The folded style is denoted by the “>
” indicator.
It is similar to the literal style; however, folded scalars
are subject to line
folding.
|
Example 8.9. Folded Scalar
|
%YAML 1.2
Legend:
|
Folding allows long lines to be broken anywhere a single space character separates two non-space characters.
|
Example 8.10. Folded Lines
>
|
%YAML 1.2
Legend: |
(The following three examples duplicate this example, each highlighting different productions.)
Lines starting with white space characters (more-indented lines) are not folded.
|
Example 8.11. More Indented Lines
>
|
%YAML 1.2
Legend: |
Line breaks and empty lines separating folded and more-indented lines are also not folded.
|
Example 8.12. Empty Separation Lines
>
|
%YAML 1.2
Legend: |
The final line break, and trailing empty lines if any, are subject to chomping and are never folded.
|
Example 8.13. Final Empty Lines
>
|
%YAML 1.2
Legend: |
For readability, block collections styles are not denoted by any indicator. Instead, YAML uses a lookahead method, where a block collection is distinguished from a plain scalar only when a key: value pair or a sequence entry is seen.
A block sequence is simply a series of
nodes, each denoted by a leading
“-
”
indicator. The “-
” indicator must be
separated
from the node by white space. This allows
“-
” to be used as the first character in a plain
scalar if followed by a non-space character (e.g.
“-1
”).
|
Example 8.14. Block Sequence
block sequence:
Legend: |
%YAML 1.2
|
The entry node may be either
completely
empty, be a nested block node, or use a compact in-line
notation. The compact notation may be used when the entry
is itself a nested block collection. In this case, both
the “-
” indicator and the following spaces are considered to be part of the
indentation of the nested collection. Note that it is not
possible to specify node
properties for such a collection.
|
Example 8.15. Block Sequence Entry Types
-
Legend: |
%YAML 1.2
|
A Block mapping is a series of entries, each presenting a key: value pair.
|
Example 8.16. Block Mappings
block mapping:
Legend: |
%YAML 1.2
|
If the “?
”
indicator is specified, the optional value node must be specified on
a separate line, denoted by the “:
” indicator. Note that YAML allows
here the same compact in-line notation described above for
block
sequence entries.
|
Example 8.17. Explicit Block Mapping Entries
Legend: |
%YAML 1.2
|
If the “?
” indicator is omitted, parsing needs to see past the implicit key, in the
same way as in the single key: value pair
flow
mapping. Hence, such keys
are subject to the same restrictions; they are limited to a single
line and must not span more than 1024 Unicode characters.
|
In this case, the value may be
specified on the same line as the implicit key. Note however that in
block mappings the value must
never be adjacent to the “:
”, as this greatly reduces
readability and is not required for JSON compatibility (unlike the case in
flow
mappings).
There is no compact notation for in-line values. Also, while both the implicit key and the
value following it may be empty,
the “:
”
indicator is mandatory. This prevents a potential ambiguity with
multi-line plain scalars.
|
Example 8.18. Implicit Block Mapping Entries
Legend: |
%YAML 1.2
|
A compact in-line notation is also available. This compact notation may be nested inside block sequences and explicit block mapping entries. Note that it is not possible to specify node properties for such a nested mapping.
|
Example 8.19. Compact Block Mappings
-
Legend:
|
%YAML 1.2
|
YAML allows flow nodes to be embedded inside block collections (but not vice-versa). Flow nodes must be indented by at least one more space than the parent block collection. Note that flow nodes may begin on a following line.
It is at this point that parsing needs to distinguish between a plain scalar and an implicit key starting a nested block mapping.
|
Example 8.20. Block Node Types
-
Legend: |
%YAML 1.2
|
The block node’s properties may span across several lines. In this case, they must be indented by at least one more space than the block collection, regardless of the indentation of the block collection entries.
|
Example 8.21. Block Scalar Nodes
literal:
Legend: |
%YAML 1.2
|
Since people perceive the “-
” indicator as indentation, nested
block
sequences may be indented by one less space to compensate, except, of course, if
nested inside another block sequence (block-out
context vs. block-in
context).
|
Example 8.22. Block Collection Nodes
sequence: !!seq
Legend: |
%YAML 1.2
|
A YAML character stream may contain several documents. Each document is completely independent from the rest.
A document may be preceded by a prefix specifying the character encoding, and optional comment lines. Note that all documents in a stream must use the same character encoding. However it is valid to re-specify the encoding using a byte order mark for each document in the stream. This makes it easier to concatenate streams.
The existence of the optional prefix does not necessarily indicate the existence of an actual document.
|
Example 9.1. Document Prefix
Legend:
|
%YAML 1.2
|
Using directives creates a
potential ambiguity. It is valid to have a “%
” character at the start of a
line (e.g. as the first character of the second line of a plain
scalar). How, then, to distinguish between an actual
directive and a content line that happens to start with a
“%
”
character?
The solution is the use of two special marker lines to control the processing of directives, one at the start of a document and one at the end.
At the start of a document,
lines beginning with a “%
” character are assumed to be
directives. The (possibly
empty) list of directives is
terminated by a directives end marker line. Lines following this
marker can safely use “%
” as the first character.
At the end of a document, a document end marker line is used to signal the parser to begin scanning for directives again.
The existence of this optional document suffix does not necessarily indicate the existence of an actual following document.
Obviously, the actual content lines are therefore forbidden to begin with either of these markers.
|
Example 9.2. Document Markers
%YAML 1.2
|
%YAML 1.2
Legend: |
A bare
document does not begin with any directives or marker lines. Such documents are very
“clean” as they contain nothing other than the content. In this case, the first
non-comment line may not start with a “%
” first character.
Document nodes are indented as if they have a parent indented at -1 spaces. Since a node must be more indented than its parent node, this allows the document’s node to be indented at zero or more spaces.
|
Example 9.3. Bare Documents
|
%YAML 1.2
Legend:
|
An explicit document begins with an explicit directives end marker line but no directives. Since the existence of the document is indicated by this marker, the document itself may be completely empty.
|
Example 9.4. Explicit Documents
Legend:
|
%YAML 1.2
|
A directives document begins with some directives followed by an explicit directives end marker line.
|
Example 9.5. Directives Documents
Legend:
|
%YAML 1.2
|
A YAML stream consists of zero or more documents. Subsequent documents require some sort of separation marker line. If a document is not terminated by a document end marker line, then the following document must begin with a directives end marker line.
The stream format is intentionally “sloppy” to better support common use cases, such as stream concatenation.
|
Example 9.6. Stream
Legend: |
%YAML 1.2
|
A sequence of bytes is a well-formed stream if, taken as a
whole, it complies with the above l-yaml-stream
production.
Some common use case that can take advantage of the YAML stream structure are:
Concatenating two YAML streams requires both to use the same character encoding. In addition, it is necessary to separate the last document of the first stream and the first document of the second stream. This is easily ensured by inserting a document end marker between the two streams. Note that this is safe regardless of the content of either stream. In particular, either or both may be empty, and the first stream may or may not already contain such a marker.
The document end marker allows signaling the end of a document without closing the stream or starting the next document. This allows the receiver to complete processing a document without having to wait for the next one to arrive. The sender may also transmit "keep-alive" messages in the form of comment lines or repeated document end markers without signalling the start of the next document.
A YAML schema is a combination of a set of tags and a mechanism for resolving non-specific tags.
The failsafe schema is guaranteed to work with any YAML document. It is therefore the recommended schema for generic YAML tools. A YAML processor should therefore support this schema, at least as an option.
URI:
tag:yaml.org,2002:map
Kind:
Definition:
Represents an associative container, where each key is unique in the association and mapped to exactly one value. YAML places no restrictions on the type of keys; in particular, they are not restricted to being scalars. Example bindings to native types include Perl’s hash, Python’s dictionary, and Java’s Hashtable.
Equality:
Example 10.1. !!map
Examples
Block style: !!map
|
URI:
tag:yaml.org,2002:seq
Kind:
Definition:
Represents a collection indexed by sequential integers starting with zero. Example bindings to native types include Perl’s array, Python’s list or tuple, and Java’s array or Vector.
Equality:
Example 10.2. !!seq
Examples
Block style: !!seq
|
URI:
tag:yaml.org,2002:str
Kind:
Definition:
Represents a Unicode string, a sequence of zero or more Unicode characters. This type is usually bound to the native language’s string type, or, for languages lacking one (such as C), to a character array.
Equality:
Two strings are equal if and only if they have the same length and contain the same characters in the same order.
Canonical Form:
The obvious.
All nodes with the “!
” non-specific tag are resolved, by the
standard convention, to
“tag:yaml.org,2002:seq
”,
“tag:yaml.org,2002:map
”, or
“tag:yaml.org,2002:str
”, according to their kind.
All nodes with the “?
” non-specific
tag are left unresolved. This constrains the
application to deal with a
partial
representation.
The JSON schema is the lowest common denominator of most modern computer languages, and allows parsing JSON files. A YAML processor should therefore support this schema, at least as an option. It is also strongly recommended that other schemas should be based on it.
The JSON schema uses the following tags in addition to those defined by the failsafe schema:
URI:
tag:yaml.org,2002:null
Kind:
Definition:
Represents the lack
of a value. This is typically bound to a native null-like
value (e.g., undef
in Perl,
None
in Python). Note that a null is
different from an empty string. Also, a mapping entry with some
key and a null value is valid, and different
from not having that key
in the mapping.
Equality:
All null
values are equal.
Canonical Form:
null
.
URI:
tag:yaml.org,2002:bool
Kind:
Definition:
Represents a true/false value. In languages without a native Boolean type (such as C), is usually bound to a native integer type, using one for true and zero for false.
Equality:
All true
values are equal. Similarly, all
false
values are equal.
Canonical Form:
true
or
false
.
Example 10.5. !!bool
Examples
YAML is a superset of JSON: !!bool true
|
URI:
tag:yaml.org,2002:int
Kind:
Definition:
Represents arbitrary sized finite mathematical integers. Scalars of this type should be bound to a native integer data type, if possible.
Some languages (such as Perl) provide only a “number” type that allows for both integer and floating-point values. A YAML processor may use such a type for integers, as long as they round-trip properly.
In some languages (such as C), an integer may overflow the native type’s storage capability. A YAML processor may reject such a value as an error, truncate it with a warning, or find some other manner to round-trip it. In general, integers representable using 32 binary digits should safely round-trip through most systems.
Equality:
An integer value is equal to any other numeric
value that evaluates to the integer value. For example, the
integer 1
is equal to the
floating-point 1.0
.
Canonical Form:
-
”
character for negative values, matching the regular expression
0 | -? [1-9] [0-9]*
URI:
tag:yaml.org,2002:float
Kind:
Definition:
Represents an approximation to real numbers, including three special values (positive and negative infinity, and “not a number”).
Some languages (such as Perl) provide only a “number” type that allows for both integer and floating-point values. A YAML processor may use such a type for floating-point numbers, as long as they round-trip properly.
Not all floating-point values can be stored exactly in any given native type. Hence a float value may change by “a small amount” when round-tripped. The supported range and accuracy depends on the implementation, though 32 bit IEEE floats should be safe. Since YAML does not specify a particular accuracy, using floating-point mapping keys requires great care and is not recommended.
Equality:
A floating-point value is equal to any other numeric value
that evaluates to the floating-point value. For example,
floating-point 1.0
is equal to the the integer
1
. Note that for the purpose of
key uniqueness, all
.nan
values are considered to be
equal. Note that in
some languages (such as Ruby and Python) “not a
number” has identity semantics and therefore
is not properly represented in YAML as
!!float .nan
.
Canonical Form:
0
, .inf
,
-.inf
, .nan
, or
scientific notation matching the regular expression
-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?
.
Example 10.7. !!float
Examples
negative: !!float -1
|
The JSON schema tag resolution is an extension of the failsafe schema tag resolution.
All nodes with the “!
” non-specific tag are resolved, by the
standard convention, to
“tag:yaml.org,2002:seq
”,
“tag:yaml.org,2002:map
”, or
“tag:yaml.org,2002:str
”, according to their kind.
Collections with the “?
” non-specific
tag (that is, untagged collections) are resolved to
“tag:yaml.org,2002:seq
” or
“tag:yaml.org,2002:map
” according to their kind.
Scalars with the “?
” non-specific
tag (that is, plain scalars) are matched with a list of
regular expressions (first match wins, e.g. 0
is resolved as !!int
). In principle, JSON
files should not contain any scalars that do not match at least one of
these. Hence the YAML processor should consider them to be an
error.
Regular expression | Resolved to tag |
null |
tag:yaml.org,2002:null |
true | false |
tag:yaml.org,2002:bool |
-? ( 0 | [1-9] [0-9]* ) |
tag:yaml.org,2002:int |
-? ( 0 | [1-9] [0-9]* ) ( \. [0-9]* )? ( [eE] [-+]? [0-9]+ )? |
tag:yaml.org,2002:float |
* |
Error |
Example 10.8. JSON Tag Resolution
A null: null
|
%YAML 1.2
|
The Core schema is an extension of the JSON schema, allowing for more human-readable presentation of the same types. This is the recommended default schema that YAML processor should use unless instructed otherwise. It is also strongly recommended that other schemas should be based on it.
The core schema uses the same tags as the JSON schema.
The core schema tag resolution is an extension of the JSON schema tag resolution.
All nodes with the “!
” non-specific tag are resolved, by the
standard convention, to
“tag:yaml.org,2002:seq
”,
“tag:yaml.org,2002:map
”, or
“tag:yaml.org,2002:str
”, according to their kind.
Collections with the “?
” non-specific
tag (that is, untagged collections) are resolved to
“tag:yaml.org,2002:seq
” or
“tag:yaml.org,2002:map
” according to their kind.
Scalars with the “?
” non-specific
tag (that is, plain scalars) are matched with an
extended list of regular expressions. However, in this case, if none
of the regular expressions matches, the scalar is resolved to
tag:yaml.org,2002:str
(that is, considered to
be a string).
Regular expression | Resolved to tag |
null | Null | NULL | ~ |
tag:yaml.org,2002:null |
/* Empty */ |
tag:yaml.org,2002:null |
true | True | TRUE | false | False | FALSE |
tag:yaml.org,2002:bool |
[-+]? [0-9]+ |
tag:yaml.org,2002:int (Base 10) |
0o [0-7]+ |
tag:yaml.org,2002:int (Base 8) |
0x [0-9a-fA-F]+ |
tag:yaml.org,2002:int (Base 16) |
[-+]? ( \. [0-9]+ | [0-9]+ ( \. [0-9]* )? ) ( [eE] [-+]? [0-9]+ )? |
tag:yaml.org,2002:float (Number) |
[-+]? ( \.inf | \.Inf | \.INF ) |
tag:yaml.org,2002:float (Infinity) |
\.nan | \.NaN | \.NAN |
tag:yaml.org,2002:float (Not a number) |
* |
tag:yaml.org,2002:str (Default) |
Example 10.9. Core Tag Resolution
A null: null
|
%YAML 1.2
|
None of the above recommended schemas preclude the use of arbitrary
explicit tags. Hence YAML processors for a particular programming
language typically provide some form of local tags that map directly to the
language’s native data
structures (e.g., !ruby/object:Set
).
While such local tags are useful for ad-hoc applications, they do not suffice for stable, interoperable cross-application or cross-platform data exchange.
Interoperable schemas make use of global tags (URIs) that represent the same data across different programming languages. In addition, an interoperable schema may provide additional tag resolution rules. Such rules may provide additional regular expressions, as well as consider the path to the node. This allows interoperable schemas to use untagged nodes.
It is strongly recommended that such schemas be based on the core schema defined above. In addition, it is strongly recommended that such schemas make as much use as possible of the the YAML tag repository at http://yaml.org/type/. This repository provides recommended global tags for increasing the portability of YAML documents between different applications.
The tag repository is intentionally left out of the scope of this specification. This allows it to evolve to better support YAML applications. Hence, developers are encouraged to submit new “universal” types to the repository. The yaml-core mailing list at http://lists.sourceforge.net/lists/listinfo/yaml-core is the preferred method for such submissions, as well as raising any questions regarding this draft.