DH 102 // Unit 2 data!!!

Mackenzie Brooks

October 23, 2013

Unit 1 = bag of words

Unit 2 = relationships

tabular data

Name Year State
Charlie 2019 Texas
Rich 2020 New Jersey
Alice 2020 Texas
Jenna 2021 California

Image credit: https://en.wikipedia.org/wiki/Relational_database

Source: https://www.dlsweb.rmit.edu.au/toolbox/knowmang/content/models/relational_model.htm

Source: http://legacy.alexandria.ucsb.edu/gazetteer/ContentStandard/version3.2/GCS3.2-guide.htm

Structured Query Language

SQL

SELECT column_name,column_name
FROM table_name
WHERE column_name operator value;
SELECT * FROM students
WHERE year='2020';

Data types

  • String/characters
  • Integer
  • Decimal
  • Boolean (T/R)
  • Date
  • Time

File formats

  • .csv = comma separated value
  • .xlsx = Excel workbook
  • .tsv = tab separated value
  • .json = Javascript Object Notation
  • .xml = Extensible Markup Language
  • RDF = resource description framework

The Dream of Linked Data

Facts

  • Computers only do what we tell them.
  • The web knows what we said, not what we mean.

<lg type="sestina">
<lg type="sestet" rhyme="ababab">
<l>I saw my soul at rest upon a <rhyme label="a" xml:id="A">day</rhyme></l>
<l>As a bird sleeping in the nest of <rhyme label="b" xml:id="B">night</rhyme>,</l>
<l>Among soft leaves that give the starlight <rhyme label="a" xml:id="C">way</rhyme></l>
<l>To touch its wings but not its eyes with <rhyme label="b" xml:id="D">light</rhyme>;</l>
<l>So that it knew as one in visions <rhyme label="a" xml:id="E">may</rhyme>,</l>
<l>And knew not as men waking, of <rhyme label="b" xml:id="F">delight</rhyme>.</l>
</lg>
Source: http://prosody.lib.virginia.edu/

so what

Open Graph Protocol

<html prefix="og: http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="video.movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>
Source: http://ogp.me/

Activity 3 / Open Graph

Google's How Search Works

https://www.google.com/search/howsearchworks/

Google

  • What do we know about Google?
  • What can we learn?
  • What concerns do we have?

Knowledge-based Trust

Removing the Truthiness from Google

How can we declare facts?

Subject > Predicate > Object

Triples

  1. W&L > located in > Lexington
  2. Madeleine L'engle > wrote > A Wrinkle in Time
  3. Luke > knows > Luke

Problem!

Luke > knows > Luke

Disambiguation solved

http://viaf.org/viaf/76391491 > http://schema.org/creator > http://www.worldcat.org/oclc/972362520

Source: https://www.w3.org/TR/rdf11-primer/

RDF

Resource Description Framework (RDF) extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.

Source: https://www.w3.org/RDF/

RDF

This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.

Source: https://www.w3.org/RDF/

Data Assessment