Monday, August 25, 2014

Erratum: Leon Czolgosz did NOT live in the Oneida Colony

As you might have noticed by now, I'm not perfect.  In fact, I'll wager that all investigative reporters (and the occasional SearchResearch blogger) make mistakes somewhere along the line.  It's inevitable. 

But an honest writer will try to fix their mistakes--that's what an erratum is all about.  In fact, if the web site you're using as a high-quality reference does NOT have a way to update their materials, you might consider that they're not such a great source.  Good newspapers, good reporters, good books all have some way to fix the record.  

Let me illustrate by example. 

Last week a SearchResearch reader, Joel Meltzer, a former resident of the Oneida Community Mansion House in Oneida, wrote to me to point out that: 

At some point someone misunderstood the fact that ANOTHER presidential assassin was an Oneida Community member, and drew the mistaken conclusion that Czolgosz was a member.  This was then added to the Oneida Community Wikipedia page.  (It has since been removed).  The statement repeated over and over again, is that Czolgosz was "briefly a member" of the community.  No one ever goes into more detail because there is no detail.  It just isn't true. Again, he was just a young boy when the community disbanded and he didn't live in Oneida!

The writer then correctly pointed out that I made this same error in my post on July 22, 2013 post "Answer: What's the connection between President McKinley's assassin and "free love"?"  

Well, that's an interesting claim... and I wondered what could have happened.  

Luckily, I have pretty good notes about writing that post, so I went back and reconstructed my searches and zeroed in on what went wrong. Here's my reconstruction: 

What went wrong  The question for that week was "What's the connection between President McKinley's assassin and "free love"?"  

In my post, I showed that Searching inside of the Google News Archives, it was simple enough to find multiple references to Noyes use of the phrase "free love."  And then a quick look in Google Books for [ Noyes "free love" ] lead me to Without Sin: The Life and Death of the Oneida Community, Spencer Klaw (1994) where you can find that "in the late summer of 1852, in an article in the Circular [the Colony’s newsletter]  he [Noyes] boldly included “Cultivation of Free Love” in a list of principles that the community stood for." 

So he's the guy who gave the notion of "Free Love" some currency. 

Now, when I looked for a connection to the assassin of President McKinley, I wrote:  
"Leon Czolgosz, who shot President McKinley at Pan-American Exposition reception on September 6, 1901.  Czolgosz, a native of Michigan and an avowed radical anarchist ( who hung out with people like Emma Goldman) was, for a short time, a member of the Oneida Colony. "  

Ever assertion like that needs to come from somewhere, and a good reporter tracks the origin (aka the provenance) of their facts.  A great reporter keeps his notes around for years just to be able to revisit questions of fact and inference.  

In this case, I had read Cults and Terrorism by Frank MacHovec where he writes 

"Charles Guiteau, President Garfield's assassin, was a 5 year Oneida member.  Leon Goglsz, for a shorter time, the assassin of President McKinley, was also an Oneida member. (Vowell, 2006)."    (emphasis mine)

That's where I got my information.  I should have been worried when MacHovec spelled the assassin's name incorrectly (it should be Cgoglsz, not Goglsz).  I admit that I did not check the reference to Vowell, 2006, but just assumed that MacHovec represented that information accurately. 

Prompted by Joel's question, I pulled out my notes, found the MacHovec citation quickly, and THEN checked (Vowell, 2006), which is by Sarah Vowell (and actually published in 2005 by Simon Schuster).  The Google Books link to Vowell's book.    

When I downloaded the book (which yes, I had to buy in order to scan completely), I read through every mention of Oneida and every mention of Cgoglsz... and none of them assert that Cgoglsz was a member of Oneida.  

So... I assume that MacHovec simply misread the book, or combined notes from different sources together and misplaced Cgoglsz at Oneida.  

Since I want to double-source everything, I looked up the Oneida colony history (from multiple sources) and found that they dissolved in 1881 (when Cgoglsz would have been 8 years old).  It's pretty clear from the biographies of Cgoglsz that he was working in steel mills from the age of 14, there's just not much possibility that he spent any time at the Oneida Community.   (It's also clear from reading a few bios of Cgogslz that he really didn't spend any time at Oneida.  Given how much detail these bios have, it's inconceivable that they would have omitted that detail of his life.) 

There it is:  Leon Frank Czolgosz, born in 1873, assassin of President McKinley, executed by electric chair in 1901, was never a member of the Oneida Community.  

On the other hand, Charles Guiteau was, for more than five years, in the Community (he later assassinated President James Garfield), so there is still a story line connecting the ideas.  Note that there's no causal relationship here (free love doesn't lead to becoming an assasin), but there is an interesting accident of history that these stories should cross.  

I'll go edit the original post to link to this.  Erratum duty discharged.  

Search on.  (Carefully!) 

Friday, August 22, 2014

Appendix: Answer: The shortest--and flattest--route there.

Really? These are crazy people. 

Oh yeah... I forgot the historical connection.  

Hannibal.  Elephants.  218 BC.  25,000 soldiers marching from Barca to Roma.  

If you do any searches with Alps, mountain passes, Oulx (etc) you'll find that the southern route is one of those that's proposed as the way Hannibal got his elephants from Spain to Italy.  It's a heck of a walk for an army, especially one that's got 37 elephants.  (That's the number he ended up with.  We don't know how many he started with.)  

And while historians debate exactly which mountain pass they hiked through (with elephants!), it had to be one of these routes. (Another version of the march from Barcelona to Rome.)  

All the other passes are worse! 

Answer: The shortest--and flattest--route there.

I asked two simple routing questions that takes a bit of figuring to get an answer: 

In both cases, there are two obvious routes from Point A to Point B.  The question is simple:  Which route is flatter?  (To be precise, find a route with the smallest elevation gain.)  

1.  Suppose I'm in the Southwest of the US and want to do a bike ride from Farmington, New Mexico, to Durango, Colorado.  What route would you recommend for the least overall elevation gain between the two cities?   
2.  Suppose that a few months later I'm in the Southeast of France and want to ride my bike from Echirolles (France) to Oulx (Italy). What route would you recommend for the least elevation gain between these two cities?  

As several sharp-eyed readers pointed out, Google Maps just recently announced a bike route elevation tool to help answer exactly this question.  (Google's announcement; TechCrunch; ...) 

You can compute bicycle elevation profiles (currently) in all the 14 countries Google offers biking directions. (Austria, Australia, Belgium, Canada, Switzerland, Germany, Denmark, Finland, Great Britain, Netherlands, Norway, New Zealand, Sweden, US)  Unfortunately, France and Italy aren't included...  

So to solve the first Challenge, the easiest thing to do is just use Google Maps, use "Find Directions" and then select the Bicycle Route option.  

Here I've just asked for the bike route from Farmington, NM to Durango, CO.  It's a lovely ride, almost 52 miles long, with an elevation gain of 1,588 feet.  

There's another obvious route that would leave Farmington on 170/140/160 to Durango.  

To see the elevation profile for that route, I just drag the route from where it is (by pressing-and-holding on one of the small circular "control points" on the line) to where I'd like it to go: 

Once I've moved it to Route 170, the map looks like this: 

This route, by contrast, is a bit shorter but has a bit more climbing in it (2,444 ft vs. 1,588 ft).    

But as Ramón pointed out in his link to the "Climb = what flat distance" article, there are often many factors to take into account when computing route relative difficulty.  This is a fascinating discussion, but since I want to ride this route anyway, I'm just going to pick the route that has the least elevation gain.  (Side note:  That article is written by cyclists in Florida, one of the world's flattest states, where, I suppose, they worry about things like this!)  

Of course, there are other tools you can use to compute the same kind of information.  In the comments, Rosemary points to the routes she explored using Strava, the athletic tracking and mapping system.  Here's one of her maps for the Farmington-to-Durango trip: 

Strava map by Rosemary
It's important to recognize that there ARE multiple tools to figure out this kind of information, each with its own capabilities.  In the Strava app you can sweep your mouse over the elevation profile at the bottom and read off the elevation and grade.  As you move the mouse, the blue dot follows along on the map just above it.  (This is true for Google Maps as well--move the mouse along the profile and see where the dot is on the route.  But Maps doesn't show the elevation or grade at that location.)  

Answer to route 1:  I'm going to stick with the first route (550/160).  It's a little less climbing.  And spot-checking the Streetviews along the route suggests that the road has nice shoulders, not a lot of traffic, and nice views.  

Question 2:  Echirolles (France) to Oulx (Italy)  We already know that Google Bicycle Routes won't work there?  What to do?  

Answer:  Find another tool to do the same kind of work.  I liked Ramón's query to find such tools

     [ cycling routes elevation comparison ]

this set of results leads to many tools for doing this kind of analysis.  For my profiling, I happened to use (but others work as well).  Here's the profile for the obvious southern route from Echirolles to Oulx: 

AND... If you click on the "get elevation image" (upper right of the blue box), you'll see just the elevation of the route selected.  

And I note that this route is 123.6 km long, with 3262 m gain, with a max grade of 30%.  (That's a HUGE grade! And there are two big hills.) 

Here, for contrast, is Rosemary's Strava map of the same route: 

Interestingly, this map shows a max grade of 17.1% (which is much less than the 30% I saw on my map). Since neither mapping service tells how they measure grade, it's hard to know which is more accurate--but all cases, this is a steep route. 

When I did the same plot for the other (northern route), I get this map: 

This route is 155.3 km, with 2837 m gain, with a max grade of 20%.  (That's still a big grade, but much better than 30%.)  

Here's the elevation profile for the northern route.  

Ramón also found a different bike route elevation site (PerfildeRuta) that does very much the same thing (also available in English).  Here's their diagram of this route.  Note that they believe the maximum grade on this route is 47%!  (Really?  That's not a grade--those are stairs!) 

And if I now spot check the two routes, the southern route looks MUCH more appealing.  The northern route is mostly major highway, while most of the southern route looks like this... 

So.  Summary: 

Southern: 123.6 km long, with 3262 m gain, with a max grade of 30%.
Northern:  155.3 km, with 2837 m gain, with a max grade of 20%.

It's clear that the Northern route is flatter, albeit slightly longer.  

BUT the Southern route is very appealing.  And as Rosemary points out, it goes right next to the Alp d'Huez, one of bicycling's most revered roads for its dramatic races throughout the history of the Tour de France.  

If I was to do this ride, I'd definitely go the Southern route.  Longer, steeper, but MUCH more appealing.  

Search Lessons 

(1) Always keep up-to-date on announcement about new search capabilities.  While people have always created elevation profiles, it's much easier if you know about the tool that does exactly that.   

(2) When using a new UI (such as that in the VeloRoute elevation profiler), pay attention to options that might not be well-marked.  The "get elevation image" is exactly what I wanted from this map, but it's a pretty hidden function.  

(3) Keep tracking of an evolving question.  Even when the person asking the question says "I want the flatter route," the process of learning about the question often reveals information that overrides the initial criteria.  This kind of thing happens all the time in real research questions.  You start with question A, but in the process of research you discover additional information that changes the question into B, then maybe question C... This is the nature of research, and definitely of search.  

Which is why we do SearchResearch; it's a fascinating way to learn more about the world at large. 

And now I need to go for a bike ride.  Thanks to everyone who wrote it on the comments.  Keep 'em coming.  

Search on! 

Wednesday, August 20, 2014

Search Challenge (8/20/14): The shortest--and flattest--route there.

Riding through the Santa Maria valley, CA. 

AS you've probably figured out by now, I love to go for long bike rides, especially in hilly terrain.  Mountains?  Bring 'em on.  Rolling hills?  Even better.  

But that doesn't mean I don't pay attention to the hills.  Every cyclist wants to have some idea about what's coming up, if only to be sure to have brought along enough water and food.  

A common thing for cyclists to do is to check the routes before heading out, just to see how hilly the day looks to be.  Or, more commonly, to choose a route that matches your abilities (and aspirations) for the day.  Some days, you want to attack the hills--other days, you need to rest a bit.  

That common question leads to two fairly simple route selection Challenges.  In both cases, there are two obvious routes from Point A to Point B.  The question is simple:  Which route is flatter?  (To be precise, find a route with the smallest elevation gain.)  

1.  Suppose I'm in the Southwest of the US and want to do a bike ride from Farmington, New Mexico, to Durango, Colorado.  What route would you recommend for the least overall elevation gain between the two cities?  
2.  Suppose that a few months later I'm in the Southeast of France and want to ride my bike from Echirolles (France) to Oulx (Italy). What route would you recommend for the least elevation gain between these two cities?  

The routes here are pretty obvious--when giving your answer, just say which roads you recommend. (For example, in NM/CO, do you recommend routes 140 and 170, or 550?  I don't need turn-by-turn routes, unless you find a REALLY unusual solution.)  

I'll give a hint tomorrow about how I solved these Challenges, but for today, I'll let you work on them.  (And we'll chat about why there are two versions of the same problem.)  

As usual, in addition to your solution, be sure to tell us your thinking--HOW did you solve the Challenge?  And what deadends did you explore along the way?  

For an unusual extra credit (and really optional):  Why is the route from (somewhere near) Echirolles to (somewhere near) Oulx of historical interest?  Who else would have deeply cared about finding the flattest path between these two locations?

Search on! 

Friday, August 15, 2014

Answer: Where can you find this in the street?

First off, excellent work by all.  I'm not sure I have a lot left to add to the discussion.  You're a crack SearchResearch team! 

But let's look at some of the ways people investigated our Challenge this week.  Remember that I asked:   

1.  Can you find the GPS coordinates of the manhole cover shown below?  And why does it have that odd shape?  (Neither triangular nor circular...)  

When I posed this question, I didn't dream that you'd actually be able to find THIS PARTICULAR cover, which is why I added " don't have to find exactly this instance of a manhole cover, but just the coordinates of any manhole cover that has this particular shape."

But I didn't actually do the Search-By-Image to test that before posting.  

In a great feat of sleuthing, Passager, AlmadenMike and Pete Warden both found that the original image was by Owen Byrne.   That's great.  Nice work. 

Then Pete went on to contact Owen Byrne on Twitter (@owenbyrne) where he confirmed that this particular photo was his.  Oh, and by the way, Owen's reading of the comments thread helped him remember that the original photo was taken at "4th and China Basin in Mission Bay."  

With that hint, it's simple to use StreetView to fly over to the intersection of "4th and China Basin" in San Francisco and find the following. (I added the red circles to highlight the covers embedded in the street.)  

A quick Command-click (or right-click for PC users) on the intersection tells me that this is 37.772019, -122.389509.  None of these have a red stripe, but the color could have well worn-off (or was not yet applied) when the StreetView car drove by.  

Nearly everyone else did some version of an initial search for:  

     [San Francisco reclaimed water covers] 

and quickly learned that these particular covers are "Valve covers used in the Mission Bay Project of San Francisco to differentiate reclaimed water from potable water are in the shape of a Reuleaux triangle."

When you check out the meaning of Reuleaux triangle, you find the Wikipedia page which tells us that  "It is a curve of constant width, meaning that the separation of two parallel lines tangent to the curve is independent of their orientation. Because all diameters are the same, the Reuleaux triangle is one answer to the question "Other than a circle, what shape can a manhole cover be made so that it cannot fall down through the hole?" 

And, like many other people, I found this fascinating, and thought how clever these folks were to design their valve covers in this way.  

But then I followed my search principle of "Use whatever information you have." which included the name of the foundry that cast the cover.  (Look at the original image:  it says "D&L Fdry" on it.)  

So I did a search for that: 

     [ D & L Fdry ] 

which led me to the foundry's web site... and their catalog!  It didn't take much looking before I found the catalog entry for the "Reuleaux triangle" valve.   

Here's the diagram from their catalog: 

Excerpt from D&L catalog:

And so while it's true that a Reuleaux triangle cannot fall through the hole it inscribes, this particular design doesn't take any advantage of that fact.  The hole is so small that there's no way the cover would fall through.  (In any case, the Reuleaux triangle shape is for the outside of the valve cover, and not the shape of the cover itself.)  The bottom line is that this clever shape is used solely as a unique visual shape, and not for its physical properties.  It might just as well have been just a triangle.  

Ah, well.  It's a great story, just not the story you might have thought.    

Excellent job by all.  Special kudos go to Pete for tracking down the original photographer and asking him where it was!  

And now...the other sewer covers.... 

I hadn't really planned on doing searches for these covers, especially since I know where they are!

This one is near my office at the GooglePlex in Mountain View, CA.
I like the pattern, even though it's a common one.  Turns out lots of
service covers in the area are tagged with the name of the company
that owns the service box (in this case I think it's fiber optics).  

This was kind of a no-brainer.  That's Mickey in the middle.
Taken from Disneyland in Anaheim, CA. 

This one isn't hard to figure either. The text is Italian,
and anything that's tagged with SPQR has got
to be in Rome.  (Look it up.)  This is a cover for a
firefighting hydrant in Rome.  See Wiki for more.

As Rosemary pointed out, the name of this image file is "Cover-Tokyo.png"
That's a big clue right there.  This was from one of my trips to Japan,
where they really DO have the most beautiful sewer covers
in the entire world.  See this remarkable collection of Japanese sewer covers
 (and thanks to Ramon for finding the collection).  

Search lessons:  

A lesson for me--always do a Search-By-Image to see if the image you're using actually appears somewhere else!  

As most searchers discovered, it wasn't too hard to go from description to the key idea (the "Reuleaux" triangle) that led to identifications.  

But as I point out, just because something has a unique geometric property ("constant width curve"), you should NOT assume that it actually makes use of that property in a meaningful way.  I have to admit to being surprised when I saw the D&L Foundry catalog.   

And finally... I'm glad that several readers commented about "never looking at manhole covers again in the same way..."  That's one of the beautiful things about getting the chance to write this blog--I see almost everything in a slightly different way these days.  I know I can find out the most remarkable things about the world.  With just a little research inclination and a few skills, the world is ready to tell you amazing stories, if only we learn to pay attention.  

Excellent work by all.  Nicely done.  

Wednesday, August 13, 2014

Wednesday Search Challenge (8/13/14): Where can you find this in the street?

This weeks' challenge is from Regular Reader Bob Darkblue who writes: 

1.  Can you find the GPS coordinates of the manhole cover shown below?  And why does it have that odd shape?  (Neither triangular nor circular...)  

Let's make this a reasonable quest:  you don't have to find exactly this instance of a manhole cover, but just the coordinates of any manhole cover that has this particular shape.  

It reminds me of that mythical Google question, "Why are manhole covers round?"  I've certainly never used that question in an interview and don't know anyone who has.  (Besides, that's way too easy a question...)  

But I've spent the past week since Bob wrote to me with this Challenge looking around at manhole covers.  I haven't spotted anything quite like this in the streets near me, so we'll have to go looking together.  

I'm not sure if you knew or not, but I'm interested in such things. (See previous blog posts:  anode covers, SFFD cisterns, or sewers in San Francisco)  I rather enjoy the art that we can see in the world around us, even in the most unexpected places.  Here are a few of my favorites, photos I've taken in various places.  Can you figure out where they're from? 

Search on!

Tuesday, August 12, 2014

Handling USGS data for simple visualization

In yesterday's post I mentioned that I'd make a short video showing how I imported and handled the USGS earthquake data in detail.  

So I got up early this morning and put it together.  Here it is: 

But in the process of doing this, I realized that a short outline of the process might be useful.  

Here's what the video shows.... 

1. Using the USGS earthquake search tool to find earthquake data in North America and download to a CSV file.
2. Uploading the CSV into Google Spreadsheets. 
3. Good data practices (setting up the data import as an uneditable sheet, labeling the data with metadata about where it came from and when downloaded). 
4.  Copying the data into new tabs for cleaning / munging / manipulation. 
5.  Extracting the name of the state from the "location" column in the data.  (4:20 to 11:00 on the video)
6.  Extracting the year of the earthquake from the "time" column.  (14:15 to 15:00 on the video)  
7.  Using the spreadsheets "Filter" function to select data by state. 
8.  Counting the number of earthquakes by year by state (using the "countif()" function).  
9.  Pulling the California and Oklahoma data together into a sheet for creating the chart.  (21:00 to 22:00 on the video)  

Most of this is pretty straightforward stuff.  Search for the data, import into a spreadsheet, and then filter/clean/transform the data until you get it into the shape you want.  

If you think about the goal of my data transformation, what I WANT is this a data table like this: 

... where column B is the number of earthquakes of magnitude > 3.0 in California, and column C is for Oklahoma.  

Most of the video is me messing around, trying to pull out the name of the state from the location description (which is in column E).  Most of the entries look like this: 

The place column is what I got from the USGS--it's in the CSV file.  But what I NEED is state(extracted), in order to get my counts of earthquakes by state.  

The hardest part of this entire process is writing the spreadsheet function that can pull the state name out of the place column.  

I ended up using a =regexpextract (...) function.  That's not the simplest, I know, but it's the way my programmer's mind works.  (To learn more about regular expressions, which are probably the handiest tool in the programmer's toolbox, see regular expression tutorial.)  

Just after I made the video, I discovered another way to do the same thing.  This is from my Google colleague Ronald Ho, and it's so clever I have to show it to you.  (Just in case you also need to extract the last term in a string on some future spreadsheet.) 

=iferror (RIGHT(N1879, LEN(N1879)-FIND("*",SUBSTITUTE(N1879," ","*",LEN(N1879) -LEN(SUBSTITUTE(N1879," ",""))))), N1879)

This whacky expression is a clever bit of programming that basically does the following: 

   1.  It find/replaces blanks ("") for every space in the string of N1879, compacting it.

   2.  It computes how many spaces there are in N1879 by subtracting the
       full-length string
LEN(N1879) from the compacted form. 

   3.  It replaces every space by a "*" character. 

   4.  The expression returns the RIGHT part of the string, everything from the last "*" 
        position to the end of the string.  

   5.  The whole expression is wrapped in an iferror so that if the string doesn't have 
        any spaces, the FIND command will cause an error, and then iferror will return 
        just the value of the string in the cell.  

It's complicated, but cute.  And it highlights an important point:  The functions that are built-into your data handling system need to be powerful enough to allow you to do whatever alterations to the data that you need. 

In this case, the Google Spreadsheets scripting language is missing a function to "find from the end" (that is, find the last space in the string).   (IF there was a FINDBACKWARD function, I would have written  = right(N1879, findbackward(N1879, " ")) which would have given me the last word in the sentence.  

Ah well.  Every language is missing something, that's why programming is sometimes a bit tricky.  

Hope you find this video useful!  

Comments (especially about how to improve this) welcome. 

Search on!