Data Mapping
By default, we perform steps to help map Schema-related data into their conventional types. Given the wide range of data sources and generation methods, these mappings help establish a minimal baseline to make other, application-specific assumptions. The following steps are performed on a best-effort basis, and any unrecognized syntax, types, and values will remain unchanged.
- Canonicalize IRIs — e.g.
https://www.schema.org/
intohttp://schema.org/
- Resolve URLs — e.g.
/path
string intohttp://example.com/path
- Drop Empty Literals — e.g. empty string dropped
- Map Enumerations — e.g.
InStock
string intoschema:InStock
IRI - Cast Data Types — e.g.
2020-04-08
string intoschema:Date
datatype
Canonicalize IRIs
The Schema.org ontology is often used with one of several base IRIs. Within our tools, we use the original, insecure apex recommendation, by default, but your data may use whichever format meets your requirements. Predicates, object IRIs, and literal datatypes, which use the following, will be converted to http://schema.org/
.
Alternate Base IRI |
---|
https://schema.org/ |
http://www.schema.org/ |
https://www.schema.org/ |
Extension domains, such as pending.schema.org
, are currently ignored.
Resolve URLs
Properties which accept a URL
datatype, such as contentUrl
, will be resolved against a base URL, if known. The following table shows some examples of how objects will be evaluated.
Input | Behavior | Note |
---|---|---|
"/path" | <http://example.com/path> | String primitive |
"/path"^^schema:Text | "/path"^^schema:Text | Explicit datatype, no change |
"http://example.com/path" | <http://example.com/path> | String, absolute |
"/path"^^ex:Type | "/path"^^ex:Type | Unrelated datatype, no change |
For URLs in properties that support both Text
and URL
datatypes, heuristics are used to determine the intended datatypes. For best results, use encoded IRIs, or use literals with an absolute or root-relative form.
Drop Empty Literals
By convention, empty literal statements will be dropped from the graph if:
- The datatype is Schema-related or an XSD primitive; and
- The lexical form is empty after collapsing any white space.
The following table shows some examples of how objects will be evaluated.
Input | Behavior | Note |
---|---|---|
"" | Dropped | |
" \t " | Dropped | Only white space |
""^^schema:Text | Dropped | |
""^^ex:Type | No change | Unrelated datatype |
"0" | No change | Not empty |
<> | No change | IRI, non-literal |
ex: | No change | Prefixed name IRI, non-literal |
This has the notable effect that a Schema-related resource cannot have an empty-valued property. However, this helps handle the common practice of publishers including empty property values simply because it's the easier method to generate templated, structured data.
Map Enumerations
All properties which support an object enumeration will be evaluated to prefer its canonical IRI form. The following table shows examples of evaluating the object of the availability
property.
Input | Output | Note |
---|---|---|
"InStock" | schema:InStock | |
" InStock" | schema:InStock | String (collapse) to IRI |
"https://schema.org/InStock" | schema:InStock | String to IRI |
"http://www.schema.org/InStock" | schema:InStock | String with Alternate Base IRI to IRI |
"ExampleUnknown" | "ExampleUnknown" | Unknown enumeration path, no change |
"http://example.com/InStock" | "http://example.com/InStock" | Unknown enumeration, no change |
schema:InStock | schema:InStock | IRI, no change |
schema:False | schema:False | IRI, no change |
schema:ExampleUnknown | schema:ExampleUnknown | IRI, no change |
Cast Data Types
All objects of Schema properties will be evaluated against their expected data types and updated to their Schema-specific data type. Refer to the Data Types section for each type's supported syntax and behaviors.