{"id":347228,"date":"2024-10-20T00:25:10","date_gmt":"2024-10-20T00:25:10","guid":{"rendered":"https:\/\/pdfstandards.shop\/product\/uncategorized\/bs-iso-285002017\/"},"modified":"2024-10-25T23:54:56","modified_gmt":"2024-10-25T23:54:56","slug":"bs-iso-285002017","status":"publish","type":"product","link":"https:\/\/pdfstandards.shop\/product\/publishers\/bsi\/bs-iso-285002017\/","title":{"rendered":"BS ISO 28500:2017"},"content":{"rendered":"
This document specifies the WARC file format:<\/p>\n
to store both the payload content and control information from mainstream Internet application layer protocols, such as the HTTP, DNS, and FTP;<\/p>\n<\/li>\n
to store arbitrary metadata linked to other stored data (e.g. subject classifier, discovered language, encoding);<\/p>\n<\/li>\n
to support data compression and maintain data record integrity;<\/p>\n<\/li>\n
to store all control information from the harvesting protocol (e.g. request headers), not just response information;<\/p>\n<\/li>\n
to store the results of data transformations linked to other stored data;<\/p>\n<\/li>\n
to store a duplicate detection event linked to other stored data (to reduce storage in the presence of identical or substantially similar resources);<\/p>\n<\/li>\n
to be extended without disruption to existing functionality;<\/p>\n<\/li>\n
to support handling of overly long records by truncation or segmentation, where desired.<\/p>\n<\/li>\n<\/ul>\n
PDF Pages<\/th>\n | PDF Title<\/th>\n<\/tr>\n | ||||||
---|---|---|---|---|---|---|---|
2<\/td>\n | National foreword <\/td>\n<\/tr>\n | ||||||
7<\/td>\n | Foreword <\/td>\n<\/tr>\n | ||||||
8<\/td>\n | Introduction <\/td>\n<\/tr>\n | ||||||
9<\/td>\n | 1 Scope 2 Normative references <\/td>\n<\/tr>\n | ||||||
10<\/td>\n | 3 Terms, definitions and abbreviated terms <\/td>\n<\/tr>\n | ||||||
11<\/td>\n | 4 File and record model <\/td>\n<\/tr>\n | ||||||
13<\/td>\n | 5 Named fields 5.1 General 5.2 WARC-Record-ID (mandatory) 5.3 Content-Length (mandatory) <\/td>\n<\/tr>\n | ||||||
14<\/td>\n | 5.4 WARC-Date (mandatory) 5.5 WARC-Type (mandatory) 5.6 Content-Type <\/td>\n<\/tr>\n | ||||||
15<\/td>\n | 5.7 WARC-Concurrent-To 5.8 WARC-Block-Digest 5.9 WARC-Payload-Digest <\/td>\n<\/tr>\n | ||||||
16<\/td>\n | 5.10 WARC-IP-Address 5.11 WARC-Refers-To 5.12 WARC-Refers-To-Target-URI 5.13 WARC-Refers-To-Date <\/td>\n<\/tr>\n | ||||||
17<\/td>\n | 5.14 WARC-Target-URI 5.15 WARC-Truncated 5.16 WARC-Warcinfo-ID 5.17 WARC-Filename <\/td>\n<\/tr>\n | ||||||
18<\/td>\n | 5.18 WARC-Profile 5.19 WARC-Identified-Payload-Type 5.20 WARC-Segment-Number 5.21 WARC-Segment-Origin-ID 5.22 WARC-Segment-Total-Length <\/td>\n<\/tr>\n | ||||||
19<\/td>\n | 6 WARC record types 6.1 General 6.2 \u2018warcinfo\u2019 6.3 \u2018response\u2019 6.3.1 General <\/td>\n<\/tr>\n | ||||||
20<\/td>\n | 6.3.2 \u2018http\u2019 and \u2018https\u2019 schemes 6.3.3 Other URI schemes 6.4 \u2018resource\u2019 6.4.1 General 6.4.2 \u2018http\u2019 and \u2018https\u2019 schemes 6.4.3 \u2018ftp\u2019 scheme <\/td>\n<\/tr>\n | ||||||
21<\/td>\n | 6.4.4 \u2018dns\u2019 scheme 6.4.5 Other URI schemes 6.5 \u2018request\u2019 6.5.1 General 6.5.2 \u2018http\u2019 and \u2018https\u2019 schemes 6.5.3 Other URI schemes 6.6 \u2018metadata\u2019 <\/td>\n<\/tr>\n | ||||||
22<\/td>\n | 6.7 \u2018revisit\u2019 6.7.1 General 6.7.2 Profile: Identical Payload Digest <\/td>\n<\/tr>\n | ||||||
23<\/td>\n | 6.7.3 Profile: Server Not Modified 6.7.4 Other profiles 6.8 \u2018conversion\u2019 <\/td>\n<\/tr>\n | ||||||
24<\/td>\n | 6.9 \u2018continuation\u2019 7 Record segmentation 8 WARC file name, size and compression <\/td>\n<\/tr>\n | ||||||
26<\/td>\n | Annex\u00a0A (informative) Use cases for writing WARC records <\/td>\n<\/tr>\n | ||||||
29<\/td>\n | Annex\u00a0B (informative) Examples of WARC records <\/td>\n<\/tr>\n | ||||||
32<\/td>\n | Annex\u00a0C (informative) WARC file size and name recommendations <\/td>\n<\/tr>\n | ||||||
33<\/td>\n | Annex\u00a0D (informative) Compression recommendations <\/td>\n<\/tr>\n | ||||||
34<\/td>\n | Bibliography <\/td>\n<\/tr>\n<\/table>\n","protected":false},"excerpt":{"rendered":" Information and documentation. WARC file format<\/b><\/p>\n |