Monthly Archives: huhtikuu 2016

Yle

Yle <3 Wikidata

Wikipedia and public service broadcasters have quite a similar mission. We both provide information for the public. One project is encyclopedic by its nature, the other journalistic. But both strive to document the world, as it was yesterday, as it is today and as it unfolds towards the future.

The Finnish Broadcasting Company, Yle, has since April 1st 2016 tagged our online news and feature articles with concepts from Wikidata. This was a rather natural next step in a development that has been ongoing within Yle for several years now. But it is still an interesting choice of path for a public service media company.

With linking journalistic content and the wiki projects to each other in machine-readable format, we hope to gain win-win situations where we can fulfill our missions even better together.

Why Wikidata?

The short answer to why is, because Freebase is shutting down. We have since 2012 tagged our articles using external vocabularies. The first one we used was the KOKO upper ontology from the Finnish thesaurus and ontology service Finto. KOKO is an ontology that consists of nearly 50 000 core concepts. It is very broad and of high semantic quality. But it doesn’t include terms for persons, organizations, events and places.

Enter Freebase. We originally chose Freebase it 2012 quite pragmatically over the competition (Geonames, Dbpedia and others) mainly because of its very well functioning API interface. When Google in December 2014 announces that it would be shutting down Freebase we had to tackle this situation in some way.

The options were either to use Freebase offline and start building our own taxonomy for new terms, or to find a replacing knowledge base. And then there was the wild card of the promised Google Knowledge Graph API. The main replacing options were DBpedia and Wikidata, no other could compete with Freebase in scope.

Some experts said that the structure and querying possibilities in Wikidata where subpar compared to DBpedia. But for tagging purpose, especially for the news departments, it was far more important that new tags could be added either manually by us ourselves or by the Wikipedians.

During 2015 there really seemed to be a lot of buzz going on around Wikidata and inside the community. And when we contacted Wikimedia Finland we were welcomed with open arms to the community, and that also made all the difference. (Thank you Susanna Ånäs!) So with the support of Wikimedia and with their assurance that Yle-content fulfilled the notability demand we went ahead with the project.

Migration

In October 2015 we had tagged articles with 28 000 different Freebase concepts. Out of those over 20 000 could be directly found with Freebase-URIs in Wikidata. For the additional part, after some waiting for the promised migration support from Google, we started to map these together.

We got help from Wikidata to load them up to the Mix’n’match-tool. (Thank you Magnus Manske!) And mapped an additional 3800+ concepts. There are still 3558 terms unmapped at the time of writing, so please feel free to have a go at it.  https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=116#the_start

Mix and Match-tool.

Mix and Match-tool.

At the time of transition to Wikidata in April 2016 our editors had been very diligent and managed to use 7000 more Freebase concepts. So the final mapping still only covered around 72% of the Freebase terms used. But as we are using several sources for tagging, this is an ongoing effort in any case, and not a problem for our information management.

Technical implementation

We did the technical integration of the Wikidata annotation service in our API-architecture at Yle. Within it we have our “Meta-API” that gathers all the tags used for describing Yle-content. When the API is called it returns results from Wikidata, KOKO and a third commercial source vocabulary we are using. Since those three vocabularies overlap partially they need to be mapped / bridged to each other (e.g. country names can be found in all three sources). The vocabulary is maintained under supervision of producer Pia Virtanen.

The API-call towards Wikidata returns results that have labels in at least Finnish, Swedish or English. Wikidata still lacks term descriptions in Swedish and Finnish to a rather high degree, so we also fetch the first paragraph from Wikipedia to provide disambiguation information to the editors that do the annotation.

The Meta-API is then implemented in the CMSs. For example in the Drupal7 sites widely used within Yle we have an own module, YILD https://www.drupal.org/project/yild, which can be used for tagging towards any external source.


The article featured in the demo video above: http://svenska.yle.fi/artikel/2016/04/15/kerry-usa-hade-ratt-att-skjuta-ner-ryska-plan

The UX encourages a workflow where the editor first inserts the text, and then manually chooses a few primary tags. After that they can by pushing a button fetch automatic annotation suggestions, which returns approximately a dozen more tags. From these the editor can select the suitable ones.

The tags are then used in the article itself, they create automatic topical pages, are used in curated topical pages as subject headings and navigation, and bring together Yle-content produced in different CMSs and organizational units in different languages. They are used in our News app Uutisvahti/Nyhetskollen. And the tags are also printed out in the source code according to schema.org specifications, mainly for SEO.

The opening of Yle’s APIs are on the road map for Yle’s internet development. Once we can provide this metadata together with data about our articles and programs, third party developers can build new and specialized solutions for specific uses and audiences.

One suggested application would be to build a “Wikipedia source finder”. So that if a Wikipedian finds a stub article, they could look up what material Yle has about the topic, and complete the article with Yle as a source.

Wikidata for annotation

We still at the time of writing only have a couple of weeks experience with annotating our content using Wikidata. Compared to Freebase there seems to be far more local persons like artists and politicians, as well as more local places and events.

For breaking news stories the Wiki-community is of great help. In a sense the wiki-community is crowdsourcing new tags. For example events like the 2016 Brussels bombings or the Panama papers leak get articles written very fast after the events have taken place. Thus creating a Wikidata-item, and a tag for us. This also gives the added benefit of a normalized label and description for the events.

In Norway several media companies, including our fellow broadcaster NRK, have initiated collaboration in specifying how important news events could be called in a unified and homogenous way. During the tragic attacks in Norway in 2011 there were over 200 different labels created and used in media for this event during the first day.

Through Wikidata we can normalize this information for all the languages we use, Finnish, Swedish and English. And we have the option to expand to any of the dozens of languages active in the Wiki-projects.

Wikidata and public service

Apart from all the technical factors that fulfil their tasks, there is another side to using Wikidata as well. It feels that the Wiki-projects and public service companies have a lot in common in their ethos and mission, but not too much to be in a competitive relationship.

The web is increasingly a place of misinformation and commercial actors monetizing on user data. It feels in accordance increasingly important to tie independent public service media, to the free-access, free-content information structure built by the public that constitutes the wiki projects. As content becomes more and more platform agnostic and data driven, this strategy seems a good investment for the future of independent journalism.

Osallistu Europeana Art History Challengeen!

Europeana ja Wikimedian eurooppalaiset paikallisorganisaatiot järjestävät yhdessä Europeana Art History Challenge -muokkaustapahtuman, jolla juhlistetaan Europeana280-kampanjaa.

Europeana kokoaa yhteen eurooppalaisten kirjastojen, museoiden ja arkistojen kulttuuriaarteita ja tällä kampanjalla juhlistetaan uuden Europeanan taidehistoriakokoelman avautumista. Kutakin Euroopan Unionin 28 jäsenvaltiota pyydettiin nimeämään 10 eurooppalaiseen taiteeseen vaikuttanutta teosta. Tapahtuman aikana teoksista kirjoitetaan, käännetään tai parannetaan Wikipedia-artikkeleita kaikille 39 kielelle.

Tapahtuma käynnistyy 15.4. ja jatkuu toukokuun loppuun.

Teosten tiedot on viety valmiiksi Wikidataan, Wikimedia-säätiön tuoreeseen tietokantaprojektiin. Keskitetyn tietokannan kautta erikieliset Wikipedia-artikkelit taideteoksista linkittyvät toisiinsa. Se auttaa kääntämisessä: jos teoksesta puuttuu artikkeli jollain kielellä, sen pohjana voi käyttää artikkelia, joka on jo olemassa jollain toisella kielellä. Projektissa on yli 10 000 taideteos–kieli -paria.

Ohje

  1. Ilmoittaudu! – merkitse itsesi osallistujalistaan ja ala kerätä pisteitä.
  2. Luo uusia artikkeleita Wikipediaan suomalaisista teoksista suomeksi tai ruotsiksi. 10 pistettä! Voit myös lisätä taiteilijan ja teoksen artikkeliin kaikki parhaat teoskuvat.
  3. Käännä artikkeleita muiden maiden teoksista suomeksi tai ruotsiksi. 5 pistettä!
  4. Täydennä teoksen tietoja Wikidatassa. Voit myös muokata taiteilijan tietoja tai muuta teokseen liittyvää tietoa. 1 piste lähteen, otsikon tai kuvailun lisäämisestä!
  5. Voita palkintoja!

Lisätietoa Wikidatasta

Tutustu Wikidatan käyttämiseen projektisivuilla Wikidata – Avoin kulttuuridata hyötykäyttöön.

Taideteokset

Suomea edustavat teokset tulevat Kansallisgalleriasta. Kansallisgalleria on lisensoinut 10 teoskuvaa avoimilla lisensseillä ja ne on saatavana avoimesti korkealaatuisina kuvina Wikimedia Commonsissa. Kuvasta on linkki tiedostoon, Wikidata-kohteen linkin löydät kuvatekstistä. Iloista sukellusta eurooppalaiseen taidehistoriaan!

Maria Wiik (1853–1928) Maailmalle, 1889 Wikidata: Q11880306 Kansallisgalleria / Antti Kuivalainen CC BY 4.0

Hugo Simberg (1873–1917) Haavoittunut enkeli, 1903 Wikidata: Q471289 Kansallisgalleria / Hannu Aaltonen CC BY 4.0

Valle Rosenberg (1891 – 1919) Istuva nainen (Taiteilijan sisar Ina Rosenberg), 1913 Wikidata: Q20800123 Kansallisgalleria / Hannu Aaltonen CC BY 4.0

Ilmari Aalto (1881–1934) Kellot, 1914 Wikidata: Q20792756 Kansallisgalleria / Antti Kuivalainen CC BY 4.0

Yrjö Ollila (1887–1932) Läksyn kuulustelu, 1923 Wikidata: Q20799751 Kansallisgalleria / Hannu Aaltonen CC BY 4.0

Väinö Kunnas (1896–1929) Harmaa tanssi, 1928 Wikidata: Q20796266 Kansallisgalleria / Hannu Aaltonen CC BY 4.0

Vilho Lampi (1898–1936) Omakuva, 1933 Wikidata: Q20773696 Kansallisgalleria / Janne Tuominen, CC BY 4.0