DCI 102 // Data exploration & cleanup

Mackenzie Brooks

September 27, 2018

ICE BREAKER: Favorite thing about fall

argument clinic

"Historical questions don’t pop out of thin air, but from a continuously shifting relationship with the past." - Scott Weingart

scalable reading

  1. start with description
  2. barrage with visualizations/stats
  3. look for surprises
  4. internal comparisons

computationally tractable questions?

explore the data

  • where are the patterns?
  • what do we expect?
  • what's missing?
  • what can we compare it to?
  • where can we find more context?

craft your research question

  • what topic do you want to explore?
  • what questions do you have?
  • what questions can you answer with this data?
  • with these methods?
  • what can you answer now vs. later?




  • decide on date range
  • create subset
  • clean dirty OCR
  • prepare with Lexos
  • visualize with ?
  • analyze results
  • repeat!

data assessment

  • see Unit 1 / Activities
  • feedback if submitted by Tuesday