Class-Specific Subsets of the Schema.org Data contained in the November 2015 Corpus

This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the November 2015 version of the Web Data Commons Microdata corpus.

Introduction

As many users are only interested in specific types of Schema.org data (like product data, event data, or address data), we have created class-specific subsets out of the complete Microdata corpus for a selection of schema.org classes. The subsets contain all instances of a specific class as well as all other data that is found on the webpages containing these instances. For example, a page containing data about a product might also contain reviews and offers for this product; a page containing data about an event might also contain data about the location of the event and the persons involved in the event. The data is represented in N-Quads format, meaning that the forth element of each quad contains the URL of the webpage from which the data was extracted.

Please note that

You are welcome to use the datasets and also to tell about your findings. If you find our datasets useful for your research, please quote the paper: The WebDataCommons Microdata, RDFa and Microformat Dataset Series by Robert Meusel, Petar Petrovski, and Christian Bizer in the Proceedings of the 13th International Semantic Web Conference: Replication, Benchmark, Data and Software Track (ISWC2014).

Class-Specific Subsets of the Schema.org Data

Class NameTotal Number ofTop Classes (Entity Count)Total File SizeQuad File
http://schema.org/AdministrativeArea Quads: 4,024,559
URLs: 94,739
Hosts: 157
http://schema.org/City (439,843)
http://schema.org/AdministrativeArea (206,744)
http://schema.org/GeoCoordinates (90,727)
http://schema.org/Country (80,020)
http://schema.org/Continent (78,483)
63 MBschemaOrgAdministrativeArea.gz (sample)
http://schema.org/Airport Quads: 83,821,303
URLs: 2,056,913
Hosts: 92
http://schema.org/Airport (26,146,902)
http://schema.org/Thing (1,151,418)
http://schema.org/WebPage (383,835)
http://schema.org/PostalAddress (17,992)
http://schema.org/GeoCoordinates (9,481)
1,559 MBschemaOrgAirport.gz (sample)
http://schema.org/Book Quads: 144,490,311
URLs: 5,048,676
Hosts: 2,926
http://schema.org/Book (8,828,018)
http://schema.org/Person (7,780,426)
http://schema.org/Offer (3,495,255)
http://schema.org/ScholarlyArticle (2,124,006)
http://schema.org/Review (1,021,383)
3,166 MBschemaOrgBook.gz (sample)
http://schema.org/City Quads: 19,449,890
URLs: 326,483
Hosts: 374
http://schema.org/City (783,165)
http://schema.org/GeoCoordinates (631,854)
http://schema.org/PostalAddress (459,576)
http://schema.org/Person (358,558)
http://schema.org/Offer (328,172)
369 MBschemaOrgCity.gz (sample)
http://schema.org/CollegeOrUniversity Quads: 15,751,909
URLs: 668,015
Hosts: 1,113
http://schema.org/CollegeOrUniversity (1,324,665)
http://schema.org/Person (902,301)
http://schema.org/CreativeWork (852,099)
http://schema.org/PostalAddress (336,922)
http://schema.org/AggregateRating (184,129)
363 MBschemaOrgCollegeOrUniversity.gz (sample)
http://schema.org/Continent Quads: 3,006,819
URLs: 81,638
Hosts: 11
http://schema.org/City (434,679)
http://schema.org/AdministrativeArea (136,428)
http://schema.org/GeoCoordinates (85,602)
http://schema.org/Continent (81,531)
http://schema.org/Country (80,623)
41 MBschemaOrgContinent.gz (sample)
http://schema.org/Country Quads: 111,241,177
URLs: 1,658,055
Hosts: 353
http://schema.org/MusicRecording (6,396,896)
http://schema.org/LodgingBusinessAmenity (1,926,324)
http://schema.org/Person (1,889,391)
http://schema.org/UserComments (1,862,689)
http://schema.org/Country (1,643,275)
2,107 MBschemaOrgCountry.gz (sample)
http://schema.org/CreativeWork Quads: 431,166,696
URLs: 17,459,873
Hosts: 67,785
http://www.schema.org/ImageObject (25,618,603)
http://schema.org/CreativeWork (16,887,192)
http://schema.org/Person (10,240,551)
http://schema.org/Comment (5,459,780)
http://schema.org/Organization (3,579,914)
13,644 MBschemaOrgCreativeWork.gz (sample)
http://schema.org/EducationalOrganization Quads: 4,870,779
URLs: 171,099
Hosts: 1,467
http://schema.org/EducationalOrganization (292,711)
http://schema.org/PostalAddress (220,582)
http://schema.org/MedicalScholarlyArticle (113,174)
http://schema.org/GeoCoordinates (89,635)
http://schema.org/EducationEvent (89,606)
101 MBschemaOrgEducationalOrganization.gz (sample)
http://schema.org/Event Quads: 221,503,317
URLs: 4,663,477
Hosts: 15,599
http://schema.org/Event (13,182,208)
http://schema.org/Place (9,496,351)
http://schema.org/PostalAddress (8,170,256)
http://schema.org/GeoCoordinates (3,732,506)
http://schema.org/AggregateOffer (3,329,870)
4,422 MBschemaOrgEvent.gz (sample)
http://schema.org/GeoCoordinates Quads: 595,550,547
URLs: 21,871,669
Hosts: 24,699
http://schema.org/GeoCoordinates (25,875,623)
http://schema.org/PostalAddress (25,169,707)
http://schema.org/LocalBusiness (12,647,771)
http://schema.org/AggregateRating (10,020,134)
http://schema.org/Place (5,833,478)
11,355 MBschemaOrgGeoCoordinates.gz (sample)
http://schema.org/GovernmentOrganization Quads: 818,612
URLs: 46,597
Hosts: 215
http://schema.org/GovernmentOrganization (68,247)
http://schema.org/PostalAddress (37,650)
http://schema.org/NewsArticle (7,821)
http://schema.org/Article (7,669)
http://schema.org/Person (7,433)
18 MBschemaOrgGovernmentOrganization.gz (sample)
http://schema.org/Hospital Quads: 9,889,992
URLs: 466,463
Hosts: 382
http://schema.org/PostalAddress (624,555)
http://schema.org/Hospital (513,651)
http://schema.org/Physician (269,465)
http://schema.org/MedicalSpecialty (143,333)
http://schema.org/GeoCoordinates (126,715)
176 MBschemaOrgHospital.gz (sample)
http://schema.org/Hotel Quads: 242,934,610
URLs: 9,647,492
Hosts: 6,104
http://schema.org/Hotel (23,274,951)
http://schema.org/LandmarksOrHistoricalBuildings (15,554,666)
http://schema.org/PostalAddress (3,409,751)
http://schema.org/Review (3,137,239)
http://schema.org/AggregateRating (2,996,422)
5,374 MBschemaOrgHotel.gz (sample)
http://schema.org/JobPosting Quads: 260,136,092
URLs: 6,519,297
Hosts: 5,465
http://schema.org/JobPosting (25,480,886)
http://schema.org/Place (19,129,649)
http://schema.org/Organization (13,640,778)
http:/schema.orgPostalAddress (7,210,314)
http://schema.org/Postaladdress (5,798,564)
5,583 MBschemaOrgJobPosting.gz (sample)
http://schema.org/LakeBodyOfWater Quads: 172,393
URLs: 1,421
Hosts: 15
http://schema.org/PostalAddress (9,102)
http://schema.org/GeoCoordinates (9,044)
http://schema.org/LakeBodyOfWater (2,722)
http://schema.org/City (1,626)
http://schema.org/Park (913)
2 MBschemaOrgLakeBodyOfWater.gz (sample)
http://schema.org/LandmarksOrHistoricalBuildings Quads: 83,178,864
URLs: 1,881,587
Hosts: 143
http://schema.org/LandmarksOrHistoricalBuildings (15,574,988)
http://schema.org/Hotel (11,875,598)
http://schema.org/Review (599,810)
http://schema.org/Offer (458,352)
http://schema.org/Organization (42,744)
1,636 MBschemaOrgLandmarksOrHistoricalBuildings.gz (sample)
http://schema.org/Language Quads: 478,882
URLs: 5,324
Hosts: 166
http://schema.org/SiteNavigationElement (17,443)
http://schema.org/Language (6,705)
http://schema.org/PostalAddress (5,329)
http://schema.org/Organization (4,002)
http:/schema.orgCreativeWork (3,653)
12 MBschemaOrgLanguage.gz (sample)
http://schema.org/Library Quads: 1,080,379
URLs: 40,661
Hosts: 74
http://schema.org/CreativeWork (56,419)
http://schema.org/Library (42,020)
http://schema.org/PostalAddress (39,047)
http://schema.org/GeoCoordinates (25,325)
http://schema.org/Place (24,336)
17 MBschemaOrgLibrary.gz (sample)
http://schema.org/LocalBusiness Quads: 521,456,793
URLs: 18,435,849
Hosts: 108,047
http://schema.org/LocalBusiness (31,675,441)
http://schema.org/PostalAddress (25,661,947)
http://schema.org/GeoCoordinates (12,846,311)
http://schema.org/AggregateRating (9,742,178)
http://schema.org/Product (6,389,847)
9,114 MBschemaOrgLocalBusiness.gz (sample)
http://schema.org/Mountain Quads: 260,180
URLs: 2,203
Hosts: 11
http://schema.org/GeoCoordinates (11,397)
http://schema.org/PostalAddress (11,032)
http://schema.org/Mountain (10,409)
http://schema.org/Review (2,371)
http://schema.org/City (1,457)
4 MBschemaOrgMountain.gz (sample)
http://schema.org/Movie Quads: 97,725,003
URLs: 2,186,736
Hosts: 3,940
http://schema.org/Person (9,061,177)
http://schema.org/Movie (5,642,868)
http://schema.org/AggregateRating (1,056,153)
http://schema.org/CreativeWork (647,785)
http://schema.org/ImageGallery (593,792)
2,326 MBschemaOrgMovie.gz (sample)
http://schema.org/Museum Quads: 1,961,968
URLs: 26,761
Hosts: 83
http://schema.org/Painting (387,854)
http://schema.org/Event (94,100)
http://schema.org/PostalAddress (33,259)
http://schema.org/Museum (28,832)
http://schema.org/GeoCoordinates (26,344)
39 MBschemaOrgMuseum.gz (sample)
http://schema.org/MusicAlbum Quads: 247,174,669
URLs: 3,349,444
Hosts: 476
http://schema.org/MusicRecording (22,595,520)
http://schema.org/MusicAlbum (13,047,953)
http://schema.org/Offer (8,576,719)
http://schema.org/AudioObject (8,509,785)
http://schema.org/Person (2,054,491)
4,266 MBschemaOrgMusicAlbum.gz (sample)
http://schema.org/MusicRecording Quads: 306,371,529
URLs: 5,569,347
Hosts: 2,309
http://schema.org/MusicRecording (31,315,000)
http://schema.org/MusicAlbum (11,885,649)
http://schema.org/AudioObject (8,740,229)
http://schema.org/Offer (8,666,075)
http://schema.org/Person (3,210,760)
5,312 MBschemaOrgMusicRecording.gz (sample)
http://schema.org/Organization Quads: 2,608,295,602
URLs: 265,168,358
Hosts: 174,690
http://schema.org/Organization (110,164,606)
http://schema.org/Product (58,516,743)
http://schema.org/TVSeries (50,407,177)
http://schema.org/Offer (35,539,617)
http://www.schema.org/ImageObject (34,179,213)
66,275 MBschemaOrgOrganization.gz (sample)
http://schema.org/Painting Quads: 999,882
URLs: 12,398
Hosts: 76
http://schema.org/Painting (398,829)
http://schema.org/Person (12,971)
http://schema.org/Comment (9,853)
http://schema.org/Museum (4,296)
http://schema.org/PostalAddress (4,113)
25 MBschemaOrgPainting.gz (sample)
http://schema.org/Park Quads: 474,400
URLs: 4,332
Hosts: 36
http://schema.org/PostalAddress (25,159)
http://schema.org/GeoCoordinates (24,220)
http://schema.org/Park (9,054)
http://schema.org/City (3,654)
http://schema.org/TouristAttraction (2,135)
7 MBschemaOrgPark.gz (sample)
http://schema.org/Person Quads: 1,904,904,283
URLs: 158,692,396
Hosts: 124,068
http://schema.org/Person (168,233,105)
http://schema.org/UserComments (25,478,348)
http://schema.org/Comment (21,175,310)
http://schema.org/ImageObject (18,982,851)
http://schema.org/Article (14,887,484)
68,476 MBschemaOrgPerson.gz (sample)
http://schema.org/Place Quads: 619,292,957
URLs: 26,497,313
Hosts: 27,040
http://schema.org/Place (41,923,530)
http://schema.org/JobPosting (18,766,319)
http://schema.org/PostalAddress (18,580,647)
http://schema.org/Organization (13,361,363)
http://schema.org/Event (9,494,370)
13,470 MBschemaOrgPlace.gz (sample)
http://schema.org/Product Quads: 3,468,117,550
URLs: 385,656,162
Hosts: 128,555
http://schema.org/Product (252,011,687)
http://schema.org/Offer (193,677,491)
http://schema.org/AggregateRating (59,559,069)
http://schema.org/Review (30,627,272)
http://schema.org/Rating (27,397,473)
72,037 MBschemaOrgProduct.gz (sample)
http://schema.org/RadioStation Quads: 827,645
URLs: 84,862
Hosts: 116
http://schema.org/RadioStation (88,838)
http://schema.org/PostalAddress (79,108)
http://schema.org/Review (19,818)
http://schema.org/Rating (19,762)
http://schema.org/AggregateRating (12,259)
16 MBschemaOrgRadioStation.gz (sample)
http://schema.org/Recipe Quads: 70,721,898
URLs: 2,167,555
Hosts: 13,899
http://schema.org/Recipe (2,345,364)
http://schema.org/AggregateRating (1,536,447)
http://schema.org/Person (1,323,968)
http://schema.org/NutritionInformation (882,922)
http://schema.org/Comment (575,184)
2,098 MBschemaOrgRecipe.gz (sample)
http://schema.org/Restaurant Quads: 18,520,624
URLs: 334,665
Hosts: 5,399
http://schema.org/PostalAddress (856,587)
http://schema.org/Restaurant (850,042)
http://schema.org/LocalBusiness (299,401)
http://schema.org/Review (267,345)
http://schema.org/AggregateRating (244,955)
372 MBschemaOrgRestaurant.gz (sample)
http://schema.org/RiverBodyOfWater Quads: 124,921
URLs: 1,128
Hosts: 10
http://schema.org/PostalAddress (6,506)
http://schema.org/GeoCoordinates (6,460)
http://schema.org/RiverBodyOfWater (2,532)
http://schema.org/City (837)
http://schema.org/Park (467)
1 MBschemaOrgRiverBodyOfWater.gz (sample)
http://schema.org/School Quads: 14,470,940
URLs: 377,816
Hosts: 225
http://schema.org/PostalAddress (1,377,570)
http://schema.org/School (1,234,580)
http://schema.org/WebSite (157,250)
http://schema.org/SearchAction (157,246)
http://www.schema.org/ProfilePage (154,767)
226 MBschemaOrgSchool.gz (sample)
http://schema.org/ShoppingCenter Quads: 476,298
URLs: 5,109
Hosts: 115
http://schema.org/PostalAddress (25,754)
http://schema.org/ShoppingCenter (24,279)
http://schema.org/ClothingStore (11,700)
http://schema.org/GeoCoordinates (6,084)
http://schema.org/Restaurant (6,014)
8 MBschemaOrgShoppingCenter.gz (sample)
http://schema.org/SkiResort Quads: 37,721
URLs: 2,522
Hosts: 28
http://schema.org/SkiResort (2,729)
http://schema.org/PostalAddress (1,204)
http://schema.org/GeoCoordinates (1,196)
http://schema.org/AggregateRating (626)
http://schema.org/Review (299)
1 MBschemaOrgSkiResort.gz (sample)
http://schema.org/SportsEvent Quads: 23,731,505
URLs: 186,809
Hosts: 490
http://schema.org/SportsEvent (1,300,672)
http://schema.org/PostalAddress (1,044,964)
http://schema.org/EventVenue (569,532)
http://schema.org/SportStat/Soccer/Goals (312,354)
http://schema.org/SportsTeam/Soccer (312,354)
355 MBschemaOrgSportsEvent.gz (sample)
http://schema.org/SportsTeam Quads: 8,013,634
URLs: 177,433
Hosts: 217
http://schema.org/Article (524,815)
http://schema.org/SportsTeam (463,401)
http://schema.org/Person (432,167)
http://schema.org/SportsMatchCompetitor (204,360)
http://schema.org/SiteNavigationElement (197,213)
188 MBschemaOrgSportsTeam.gz (sample)
http://schema.org/StadiumOrArena Quads: 11,335,338
URLs: 56,789
Hosts: 48
http://schema.org/PostalAddress (910,873)
http://schema.org/SportsEvent (683,901)
http://schema.org/EventVenue (625,080)
http://schema.org/StadiumOrArena (294,430)
http://schema.org/MusicEvent (159,121)
168 MBschemaOrgStadiumOrArena.gz (sample)
http://schema.org/TVEpisode Quads: 39,932,069
URLs: 684,785
Hosts: 285
http://schema.org/TVEpisode (3,930,861)
http://schema.org/Person (1,750,909)
http://schema.org/TVSeries (603,590)
http://schema.org/AggregateRating (304,163)
http://schema.org/SiteNavigationElement (261,031)
808 MBschemaOrgTVEpisode.gz (sample)
http://schema.org/TelevisionStation Quads: 38,074
URLs: 1,546
Hosts: 24
http://schema.org/TelevisionStation (6,792)
http://schema.org/PostalAddress (428)
http://www.schema.org/UserComments (401)
http://schema.org/Review (381)
http://schema.org/Rating (375)
1 MBschemaOrgTelevisionStation.gz (sample)

In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.

Get the Code

The source code can be checked out from our Subversion repository. The extraction of November 2015 was done with version 1.0.4 of the extractor. For more information about the framework and a detailed description how to run a own extraction visit the framework page.

Get Support

Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.