This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the October 2024 version of the Web Data Commons Microdata and JSON-LD corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, job postings,
or data describing local businesses), we have created class-specific subsets out of the complete and merged Microdata and JSON-LD corpora for a
selection of schema.org classes.
The subsets contain all instances of a specific class of either formats as well as all other data that is found on
the webpages containing these instances. For example, a page containing data about a product might also contain
reviews and offers for this product; a page containing data about an event might also contain data about the
location of the event and the persons involved in the event.
The data is represented in N-Quads format, meaning that the forth
element of each quad contains the URL of the webpage from which the data was extracted.
To facilitate the download and access to the class specific data, we provide the schema.org subsets in chunks. Each
chunk contains quads of specific pay-level-domains (PLDs), i.e. all quads of one PLD, e.g. yummly.com,
are organized within the same chunk file. Additionally, we provide lookup files containing the mappings between PLDs
and their corresponding chunks as well as csv files with PLD-specific statistics.
Please note that:
| Schema.org Subset | General Stats | Related Classes | Size (# Files) | Download (Sample) | PLD to File look-up PLD Specific Stats |
|---|---|---|---|---|---|
| AdministrativeArea | Quads: 96,084,887 URLs: 521,570 Hosts: 4,933 | http://schema.org/ListItem (1,499,745)http://schema.org/ImageObject (1,454,619) http://schema.org/AdministrativeArea (1,301,541)http://schema.org/Person (976,864) http://schema.org/PostalAddress (966,277) | 1.13 GB (8) |
AdministrativeArea (sample) |
lookup_file pld_stats_file |
| Airport | Quads: 53,683,384 URLs: 173,690 Hosts: 1,003 | http://schema.org/Airport (3,562,446)http://schema.org/GeoCoordinates (2,546,608) http://schema.org/Flight (1,331,723)http://schema.org/Airline (1,258,369) http://schema.org/Offer (1,139,953) | 455.06 MB (5) |
Airport (sample) |
lookup_file pld_stats_file |
| Answer | Quads: 1,617,339,307 URLs: 14,297,841 Hosts: 414,210 | http://schema.org/Answer (60,187,175)http://schema.org/Question (51,844,637) http://schema.org/ListItem (32,575,196)https://schema.org/Answer (22,033,793) http://schema.org/ImageObject (20,709,326) | 23.75 GB (118) |
Answer (sample) |
lookup_file pld_stats_file |
| Book | Quads: 249,577,347 URLs: 4,207,738 Hosts: 18,993 | http://schema.org/Book (10,288,815)http://schema.org/Country (6,776,472) http://schema.org/Person (5,754,742)http://schema.org/Offer (3,590,095) http://schema.org/ListItem (3,350,101) | 3.66 GB (19) |
Book (sample) |
lookup_file pld_stats_file |
| City | Quads: 235,102,338 URLs: 1,155,997 Hosts: 16,149 | http://schema.org/City (5,772,774)http://schema.org/ImageObject (4,144,520) http://schema.org/Person (4,069,973)http://schema.org/PostalAddress (3,790,669) http://schema.org/OpeningHoursSpecification (2,991,182) | 2.33 GB (17) |
City (sample) |
lookup_file pld_stats_file |
| ClaimReview | Quads: 3,919,703 URLs: 49,707 Hosts: 343 | http://schema.org/Organization (123,300)http://schema.org/ImageObject (95,783) http://schema.org/ListItem (93,535)http://schema.org/Person (66,621) http://schema.org/ClaimReview (59,709) | 56.3 MB (1) |
ClaimReview (sample) |
lookup_file pld_stats_file |
| CollegeOrUniversity | Quads: 112,774,933 URLs: 1,001,573 Hosts: 5,121 | http://schema.org/ImageObject (4,911,779)http://schema.org/CollegeOrUniversity (3,891,936) http://schema.org/Person (3,167,873)http://schema.org/PostalAddress (2,714,267) http://schema.org/GeoCoordinates (1,995,601) | 1.06 GB (6) |
CollegeOrUniversity (sample) |
lookup_file pld_stats_file |
| Continent | Quads: 759,731 URLs: 6,752 Hosts: 66 | http://schema.org/City (57,883)http://schema.org/AdministrativeArea (42,597) http://schema.org/Country (10,423)http://schema.org/Continent (7,337) http://schema.org/GeoCoordinates (5,692) | 7.28 MB (1) |
Continent (sample) |
lookup_file pld_stats_file |
| Country | Quads: 950,449,123 URLs: 7,110,729 Hosts: 35,296 | http://schema.org/Country (31,979,875)http://schema.org/ListItem (23,422,062) http://schema.org/Organization (15,663,467)http://schema.org/PostalAddress (11,083,809) http://schema.org/Offer (11,006,875) | 10.21 GB (61) |
Country (sample) |
lookup_file pld_stats_file |
| CreativeWork | Quads: 2,063,764,239 URLs: 45,267,975 Hosts: 1,325,582 | https://schema.org/CreativeWork (80,055,282)https://schema.org/SiteNavigationElement (55,881,471)https://schema.org/Person (40,207,650)https://schema.org/WPHeader (32,242,351)https://schema.org/WPFooter (30,334,529) | 68.24 GB (162) |
CreativeWork (sample) |
lookup_file pld_stats_file |
| Dataset | Quads: 58,626,384 URLs: 694,106 Hosts: 2,024 | http://schema.org/DataDownload (2,584,051)http://schema.org/Dataset (1,559,636) http://schema.org/Organization (1,056,184)http://schema.org/PropertyValue (744,074) http://schema.org/Person (737,414) | 655.18 MB (5) |
Dataset (sample) |
lookup_file pld_stats_file |
| EducationalOrganization | Quads: 67,326,432 URLs: 830,228 Hosts: 11,630 | http://schema.org/EducationalOrganization (1,393,304)http://schema.org/ListItem (1,202,332) http://schema.org/ImageObject (983,688)http://schema.org/PostalAddress (955,379) http://schema.org/Person (627,436) | 810.6 MB (6) |
EducationalOrganization (sample) |
lookup_file pld_stats_file |
| Event | Quads: 1,959,166,969 URLs: 14,077,443 Hosts: 399,466 | http://schema.org/Event (62,976,813)http://schema.org/Place (47,078,655) http://schema.org/PostalAddress (36,842,200)http://schema.org/Person (23,766,388) http://schema.org/ListItem (19,233,928) | 20.83 GB (133) |
Event (sample) |
lookup_file pld_stats_file |
| FAQPage | Quads: 1,416,284,547 URLs: 11,599,660 Hosts: 385,247 | http://schema.org/Answer (48,925,434)http://schema.org/Question (48,641,444) http://schema.org/ListItem (30,143,784)http://schema.org/ImageObject (20,603,824) https://schema.org/Answer (17,418,044) | 19.89 GB (104) |
FAQPage (sample) |
lookup_file pld_stats_file |
| GeoCoordinates | Quads: 3,183,190,155 URLs: 25,257,059 Hosts: 567,265 | http://schema.org/ListItem (73,477,298)http://schema.org/PostalAddress (53,035,222) http://schema.org/GeoCoordinates (50,513,043)http://schema.org/OpeningHoursSpecification (32,388,897)http://schema.org/Offer (31,582,635) | 33.28 GB (237) |
GeoCoordinates (sample) |
lookup_file pld_stats_file |
| GovernmentOrganization | Quads: 25,785,444 URLs: 389,490 Hosts: 1,940 | http://schema.org/ListItem (1,425,244)http://schema.org/GovernmentOrganization (547,285) http://schema.org/ImageObject (478,529)http://schema.org/PropertyValue (289,526) http://schema.org/PostalAddress (228,164) | 305.5 MB (3) |
GovernmentOrganization (sample) |
lookup_file pld_stats_file |
| Hospital | Quads: 17,743,553 URLs: 178,158 Hosts: 2,489 | http://schema.org/PostalAddress (408,816)http://schema.org/Hospital (341,523) https://schema.org/MedicalProcedure (265,300)http://schema.org/GeoCoordinates (230,027) http://schema.org/ListItem (193,692) | 185.01 MB (2) |
Hospital (sample) |
lookup_file pld_stats_file |
| Hotel | Quads: 244,111,716 URLs: 1,961,598 Hosts: 24,641 | http://schema.org/ImageObject (12,124,401)http://schema.org/Hotel (4,413,099) http://schema.org/PostalAddress (4,118,923)http://schema.org/ListItem (4,004,074) http://schema.org/AggregateRating (2,332,130) | 3.19 GB (17) |
Hotel (sample) |
lookup_file pld_stats_file |
| JobPosting | Quads: 175,205,867 URLs: 3,606,092 Hosts: 63,320 | http://schema.org/PostalAddress (6,753,848)http://schema.org/Place (6,688,964) http://schema.org/Organization (4,451,973)http://schema.org/JobPosting (4,068,348) http://schema.org/ListItem (2,519,108) | 4.86 GB (14) |
JobPosting (sample) |
lookup_file pld_stats_file |
| LakeBodyOfWater | Quads: 35,276 URLs: 689 Hosts: 100 | http://schema.org/ImageObject (1,060)http://schema.org/Organization (765) http://schema.org/WebPage (687)http://schema.org/LakeBodyOfWater (681)http://schema.org/Person (562) | 0.63 MB (1) |
LakeBodyOfWater (sample) |
lookup_file pld_stats_file |
| LandmarksOrHistoricalBuildings | Quads: 3,005,418 URLs: 33,100 Hosts: 460 | http://schema.org/ImageObject (112,997)http://schema.org/LandmarksOrHistoricalBuildings (95,367)http://schema.org/PostalAddress (64,910)http://schema.org/CreativeWork (50,722) http://schema.org/OpeningHoursSpecification (49,374) | 49.11 MB (1) |
LandmarksOrHistoricalBuildings (sample) |
lookup_file pld_stats_file |
| Language | Quads: 586,551,994 URLs: 4,742,085 Hosts: 11,556 | http://schema.org/Person (25,797,771)http://schema.org/Comment (19,971,596) http://schema.org/ListItem (10,307,076)http://schema.org/Language (9,360,716) http://schema.org/InteractionCounter (7,608,122) | 9.82 GB (46) |
Language (sample) |
lookup_file pld_stats_file |
| Library | Quads: 7,343,688 URLs: 206,299 Hosts: 938 | http://schema.org/Library (220,963)http://schema.org/Place (115,805) http://schema.org/CreativeWork (108,818)http://schema.org/ListItem (95,132) http://schema.org/PostalAddress (90,187) | 75.56 MB (1) |
Library (sample) |
lookup_file pld_stats_file |
| LocalBusiness | Quads: 2,245,941,658 URLs: 27,184,047 Hosts: 1,456,650 | http://schema.org/ListItem (68,902,449)http://schema.org/LocalBusiness (42,248,382) http://schema.org/PostalAddress (39,579,198)http://schema.org/ImageObject (16,996,637) http://schema.org/Offer (16,958,169) | 23.23 GB (176) |
LocalBusiness (sample) |
lookup_file pld_stats_file |
| Mountain | Quads: 232,960 URLs: 11,296 Hosts: 63 | http://schema.org/Mountain (20,970)http://schema.org/GeoCoordinates (13,074) http://schema.org/propertyValue (5,749)http://schema.org/ListItem (1,101) http://schema.org/Place (712) | 2.56 MB (1) |
Mountain (sample) |
lookup_file pld_stats_file |
| Movie | Quads: 150,224,569 URLs: 1,849,096 Hosts: 8,969 | http://schema.org/Person (9,033,142)http://schema.org/Movie (3,785,443) http://schema.org/ListItem (2,092,306)http://schema.org/AggregateRating (1,498,480) http://schema.org/Place (1,232,017) | 1.82 GB (12) |
Movie (sample) |
lookup_file pld_stats_file |
| Museum | Quads: 5,066,100 URLs: 81,577 Hosts: 653 | http://schema.org/PostalAddress (108,570)http://schema.org/ListItem (81,923) http://schema.org/Museum (81,127)http://schema.org/ImageObject (72,825) http://schema.org/OpeningHoursSpecification (63,146) | 47.21 MB (1) |
Museum (sample) |
lookup_file pld_stats_file |
| MusicAlbum | Quads: 81,151,787 URLs: 582,435 Hosts: 2,812 | http://schema.org/Country (6,016,664)http://schema.org/Offer (2,290,150) http://schema.org/MusicRecording (2,229,256)http://schema.org/MusicAlbum (1,964,884) http://schema.org/MusicGroup (1,252,134) | 755.87 MB (5) |
MusicAlbum (sample) |
lookup_file pld_stats_file |
| MusicRecording | Quads: 115,955,562 URLs: 879,727 Hosts: 5,314 | http://schema.org/MusicRecording (6,361,413)http://schema.org/Country (4,576,815) http://schema.org/Offer (2,499,674)http://schema.org/MusicAlbum (1,372,400) https://schema.org/MusicRecording (1,357,950) | 1.08 GB (7) |
MusicRecording (sample) |
lookup_file pld_stats_file |
| Organization | Quads: 40,063,217,202 URLs: 612,866,985 Hosts: 4,318,211 | http://schema.org/ListItem (1,116,144,976)http://schema.org/ImageObject (837,488,023) http://schema.org/Organization (825,414,748)http://schema.org/Offer (451,019,145) http://schema.org/BreadcrumbList (390,102,953) | 488.41 GB (3072) |
Organization (sample) |
lookup_file pld_stats_file |
| Painting | Quads: 10,557,775 URLs: 62,179 Hosts: 530 | http://schema.org/Person (2,199,905)http://schema.org/Offer (478,440) http://schema.org/Painting (264,229)http://schema.org/Product (154,817) http://schema.org/ListItem (90,303) | 75.59 MB (1) |
Painting (sample) |
lookup_file pld_stats_file |
| Park | Quads: 645,285 URLs: 8,015 Hosts: 337 | http://schema.org/PostalAddress (25,328)http://schema.org/Organization (15,538) http://schema.org/Park (8,571)http://schema.org/ListItem (7,464) http://schema.org/GeoCoordinates (7,251) | 6.27 MB (1) |
Park (sample) |
lookup_file pld_stats_file |
| Person | Quads: 25,755,663,162 URLs: 332,374,298 Hosts: 5,567,680 | http://schema.org/ImageObject (603,863,544)http://schema.org/Person (553,126,279) http://schema.org/ListItem (552,466,104)http://schema.org/Organization (273,891,715) http://schema.org/WebPage (271,622,375) | 401.48 GB (1953) |
Person (sample) |
lookup_file pld_stats_file |
| Place | Quads: 3,314,637,936 URLs: 26,959,041 Hosts: 536,276 | http://schema.org/Place (84,439,411)http://schema.org/ListItem (69,600,800) http://schema.org/PostalAddress (68,404,034)http://schema.org/Event (51,433,809) http://schema.org/Person (34,850,605) | 38.3 GB (246) |
Place (sample) |
lookup_file pld_stats_file |
| Product | Quads: 21,539,828,659 URLs: 279,715,051 Hosts: 3,309,209 | http://schema.org/Offer (749,382,740)http://schema.org/ListItem (500,258,027) http://schema.org/Product (492,076,060)http://schema.org/Organization (279,065,071) http://schema.org/ImageObject (153,492,459) | 242.98 GB (1658) |
Product (sample) |
lookup_file pld_stats_file |
| QAPage | Quads: 150,385,974 URLs: 2,328,487 Hosts: 11,113 | http://schema.org/Person (8,305,793)http://schema.org/Answer (6,534,539) http://schema.org/ListItem (2,161,061)http://schema.org/Question (2,116,916) http://schema.org/QAPage (2,000,004) | 2.76 GB (12) |
QAPage (sample) |
lookup_file pld_stats_file |
| Question | Quads: 1,632,186,128 URLs: 15,016,691 Hosts: 418,451 | http://schema.org/Answer (59,457,402)http://schema.org/Question (52,839,646) http://schema.org/ListItem (32,374,542)https://schema.org/Answer (21,589,599) http://schema.org/ImageObject (21,100,233) | 23.99 GB (120) |
Question (sample) |
lookup_file pld_stats_file |
| RadioStation | Quads: 11,699,488 URLs: 236,850 Hosts: 862 | http://schema.org/ListItem (318,036)http://schema.org/RadioStation (285,586) http://schema.org/NewsArticle (201,586)http://schema.org/ImageObject (161,882) http://schema.org/WPSideBar (123,784) | 161.49 MB (1) |
RadioStation (sample) |
lookup_file pld_stats_file |
| Recipe | Quads: 258,349,284 URLs: 2,746,545 Hosts: 37,304 | http://schema.org/HowToStep (8,610,659)http://schema.org/ListItem (5,355,018) http://schema.org/ImageObject (3,430,763)http://schema.org/Person (3,051,861) http://schema.org/Recipe (2,922,378) | 3.84 GB (21) |
Recipe (sample) |
lookup_file pld_stats_file |
| Restaurant | Quads: 158,662,564 URLs: 1,186,870 Hosts: 84,256 | http://schema.org/Offer (6,208,346)http://schema.org/MenuItem (3,963,413) http://schema.org/Restaurant (2,969,736)http://schema.org/Product (2,780,205) http://schema.org/ListItem (2,372,360) | 1.59 GB (11) |
Restaurant (sample) |
lookup_file pld_stats_file |
| RiverBodyOfWater | Quads: 170,020 URLs: 1,418 Hosts: 25 | https://schema.org/Canal (16,992)https://schema.org/Service (5,580) http://schema.org/ImageObject (2,198)http://schema.org/ListItem (2,022) http://schema.org/TouristDestination (1,746) | 1.38 MB (1) |
RiverBodyOfWater (sample) |
lookup_file pld_stats_file |
| School | Quads: 10,071,921 URLs: 187,087 Hosts: 2,099 | http://schema.org/School (291,497)http://schema.org/ListItem (194,016) http://schema.org/PostalAddress (180,523)http://schema.org/Organization (106,718) http://schema.org/ImageObject (95,256) | 113.51 MB (1) |
School (sample) |
lookup_file pld_stats_file |
| SearchAction | Quads: 27,878,062,181 URLs: 417,720,816 Hosts: 6,756,347 | http://schema.org/ListItem (1,052,351,965)http://schema.org/ImageObject (653,529,667) http://schema.org/WebSite (433,191,069)http://schema.org/SearchAction (422,553,707) http://schema.org/BreadcrumbList (408,755,536) | 265.53 GB (2194) |
SearchAction (sample) |
lookup_file pld_stats_file |
| ShoppingCenter | Quads: 15,255,169 URLs: 135,249 Hosts: 1,345 | http://schema.org/Offer (363,660)http://schema.org/ListItem (251,172) http://schema.org/PostalAddress (249,166)http://schema.org/Organization (238,757) http://schema.org/ShoppingCenter (180,907) | 157.32 MB (2) |
ShoppingCenter (sample) |
lookup_file pld_stats_file |
| SkiResort | Quads: 1,173,165 URLs: 28,128 Hosts: 245 | http://schema.org/ListItem (42,596)http://schema.org/SkiResort (38,305) http://schema.org/PostalAddress (24,781)http://schema.org/Person (21,854) http://schema.org/Review (21,440) | 15.61 MB (1) |
SkiResort (sample) |
lookup_file pld_stats_file |
| SportsEvent | Quads: 118,751,716 URLs: 801,065 Hosts: 7,213 | http://schema.org/SportsTeam (6,022,233)http://schema.org/SportsEvent (5,823,577) http://schema.org/Place (5,054,488)http://schema.org/PostalAddress (4,570,509) http://schema.org/Organization (1,017,313) | 901.17 MB (9) |
SportsEvent (sample) |
lookup_file pld_stats_file |
| SportsTeam | Quads: 99,701,119 URLs: 754,061 Hosts: 4,063 | http://schema.org/SportsTeam (7,166,129)http://schema.org/SportsEvent (2,995,518) http://schema.org/Place (2,387,843)http://schema.org/PostalAddress (2,094,582) http://schema.org/Person (1,310,001) | 810.39 MB (8) |
SportsTeam (sample) |
lookup_file pld_stats_file |
| StadiumOrArena | Quads: 14,431,742 URLs: 57,177 Hosts: 256 | http://schema.org/SportsTeam (937,935)http://schema.org/StadiumOrArena (322,752) http://schema.org/SportsEvent (247,964)http://schema.org/SportsMatchCompetitor (247,746) http://schema.org/Organization (231,215) | 108.79 MB (2) |
StadiumOrArena (sample) |
lookup_file pld_stats_file |
| TVEpisode | Quads: 29,569,610 URLs: 220,849 Hosts: 1,065 | http://schema.org/Country (3,253,437)http://schema.org/TVEpisode (974,857) http://schema.org/Person (505,723)https://schema.org/TVEpisode (299,969) http://schema.org/TVSeries (213,849) | 249.69 MB (3) |
TVEpisode (sample) |
lookup_file pld_stats_file |
| TelevisionStation | Quads: 1,927,220 URLs: 22,720 Hosts: 89 | http://schema.org/ListItem (44,890)http://schema.org/ImageObject (41,683) http://schema.org/TelevisionStation (39,376)http://schema.org/Person (26,370) http://schema.org/WebPage (24,915) | 21.03 MB (1) |
TelevisionStation (sample) |
lookup_file pld_stats_file |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
We provide the extracted data for download using a variation of the N-Quads format. For users who prefer other formats, we provide code for converting the download files into CSV and JSON formats, which are supported by a wide range of spreadsheet applications, relational databases and data mining frameworks like the python data analysis library pandas. Please find further details on how to convert the download files to other formats on the main page.
The jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus can be checked out from our Git repository.
The extraction of December 2024 was done with version 1.5 of the extractor. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.