Algorithmic Redistricting

I saw this evening an article from the Washington Post claiming that someone had “solved” gerrymandering by creating a computer program that algorithmically creates ideally compact redistricting based on Census data.

The title is a little presumptuous; there are so many other important factors besides compactness. But the general point I agree with: define congressional districts algorithmically, taking all the relevant factors into account. Here are some factors that redistricting needs to account for, in the rough order of importance:

  1. Equal population & using Census Block boundaries
  2. Compactness & Contiguity (all parts are connected to each other)
  3. Political competition (just the right amount)
  4. Creating majority minority districts where appropriate (the good kind of gerrymandering, to increase representation for underserved racial groups)
  5. Preservation of city and county boundaries
  6. Respect for “communities of interest” (fuzzy idea about geographic tribes, CoI’s need to make the case for their consideration in the process = lobbying)
  7. Incumbent protection

Equal population standards can be very strict, depending on the state. There are many ways of measuring compactness, but I feel it should probably include more consideration of travel time than one usually see; it’s usually “how close to a circle is the district”.

As for competition, theoretically a 50/50 district would lead to less polarization in Congress, which would be good. In practice, political parties will try to get about 55/45 or 60/40 for themselves so that they’re assured a win without wasting any votes. If a district is gerrymandered to go 80/20 for one party, that’s actually bad for that party because that’s 30% more of their voters in that district than necessary, and those voters could be helping elect someone from that part in a different district. Party identification is a less reliable way to measure political competition than actual election result data.

There are trade-offs in these criteria too. If you focus on competitiveness, you will have fewer majority minority districts. Because it’s very much like gerrymandering, focusing on majority minority districts will decrease compactness. Focusing on preserving city and county boundaries will decrease competitiveness and compactness. And respecting communities of interest is also likely to reduce competitiveness and compactness.

It would be great to have all of these factors determined algorithmically (even though computers are only as objective as the people who program them, and only as accurate as the data that they use) and then combined into composite redistricting maps weighing the factors according to their importance. You would have to set the relative importance of different criteria as variables that are adjustable by the user. Needless to say, it is more complicated than one algorithm designed to maximize compactness and equal population “solving” gerrymandering.

I used this great slideshow from the Redistricting Institute for a lot of the details contained here.

More graphs: North American Sports Teams by Google Search Popularity

FiveThirtyEight has been running a great series comparing teams in different professional sports leagues across North America and the world based on the teams’ popularity in Google Search.

This morning they ran a piece indexing together the popularity of all teams in six of the most popular leagues: MLB, NFL, NBA, NHL, LIGA MX, CFL, and MLS.

Side note: they mentioned that WNBA didn’t come close to these, but I wonder how WNBA compares to MLU?

Anyway, they had some nice charts, but none that showed visually the popularity distributions of the different leagues. In setting out to make that, I also made one showing all the teams with popularity twice the average of the whole set of teams (that average was indexed to 1 by FiveThirtyEight).
sports teams google search popularity - all distribution
It seems that NBA, NHL, and LIGA MX all have comparable popularity as leagues, at least on the same order of magnitude. Admittedly, this visual doesn’t show well how popularity is distributed within the main group of teams in a league: they’re all jumbled together.

It also seems the football is just slightly more popular than baseball in aggregate, but the Yankees and the Red Sox, outliers in their league, blow even the most popular NFL teams out of the water. That is illustrated further in the following chart:
sports teams google search popularity - top graph
Just a few notes about this one. First, the gap between the Yankees/Sox and the top teams of the other two leagues is striking, as FiveThirtyEight noticed. Second, there are only three leagues whose teams achieve double popularity of the average (though I suppose that makes sense with seven leagues in the set). Third, we get into the main group of both MLB and NFL in this chart, and it really shows the high popularity of NFL teams, which make up more than half the teams on this chart.

I will also note that there are two New England teams in the top ten. Go Sox and Pats!

Graph: How Cold Has It Been This Winter?

Last week FiveThirtyEight posted an article looking at weather data nationally to assess the notion that this winter was particularly cold. I am quick to caveat such claims as applications of the availability heuristic, so I loved FiveThirtyEight’s analysis to settle with real measurements whether the claim is true.

But even the shiny graphs and rigorous analysis of FiveThirtyEight left me unsatisfied on this issue. That’s because what I really cared about was my local weather. So I dug around and found some data of my own: average and 2013/14 high and low temperatures for my home of Amherst, Massachusetts.

Highs and Lows: Average vs Actual

amherst temperature actuals

First a note: I like using highs and lows rather than daily average temperatures because they feel more real to me. The temperature oscillated between these bounds on that day, but how long was it actually at that specific average temperature? That said, my results are going to resemble those I would have gotten had I used averages, so it’s not super important in this case.

The above graph is pretty messy, so we can’t really answer the question with it too well. So I made another one:

Moving Weekly Average of Deviation from Temperature Norm

Amherst temperature averages

This one can answer the question. Yes, it has been cold. Especially since late January, but also multiple times this winter before that as well.

To make this, first I subtracted the average highs and lows for each day from the actuals. For each day, this number showed me how many degrees warmer or colder it got than the mean. I then averaged these variance numbers together to get a composite number (see, doesn’t that look like average temperature?) measuring what the mean variance was from high and low temperatures. Finally, I took the moving average for each day and the three days before and after it. I did this because our perceptions of what the weather is like are influenced not only by what’s going on in the moment, but also what’s happened recently and what’s in the forecast.

The result, as you can see, shows us bizarrely cyclical trends in temperature this winter. Every 2.5 to 3 weeks, we see this measure of temperature variance cycle back to another peak or valley. I can’t think of any methodological error that would distort the results in this way (except maybe the moving average, but that shouldn’t regulate such long stretches of time), and have no reason to believe the source data are wrong, so my best guess is that it’s coincidental.

Presuming my methodology is sound, this is just the sort of graph I was looking for to explain what the temperature was like this winter. I hope you find it interesting as well.

Facebook Gaydar: Your friends’ demographics predict your own

gay on fbThis study is cool from a statistics geek perspective. But it’s disturbing from a couple other perspectives. They frame it as disturbing from a privacy perspective, which is obviously true. But I want to highlight another aspect of it that makes me a little uncomfortable: that it focuses on sexual orientation.

The tendency to be curious about others’ sexual orientation is to some degree very human and natural (see gossip) but it is also connected to a culture in which any sexual orientation (not to mention gender identity and gender expression) outside of the mainstream is considered scandalous. This study makes me feel like further power of privacy is being stripped from oppressed people, and that makes me a little uncomfortable. Of course the study is just drawing attention to the fact that this is an existing privacy risk, not creating the risk itself. But as FlowingData blogger Nathan Yau speculated in his post on the study, it’s likely that similar results would be possible for other demographics such as age and race. The focus on sexual orientation is too evocative of past (and current) cultures which sensationalize coming out and even “outing” people.

That said, it is super cool from a statistics perspective. As previously mentioned, I love using data from real life. In that respect, keep it coming. But be careful of the cultural implications of your work.

American Communities Project

American Communities ProjectThis is the most fascinating mapping project I’ve seen in a while, and I really like mapping projects. The American Communities Project starts from the premise that “changes in technology and economics are redefining the social, political and cultural fault lines that make the country what it is.” The take this assumption, add boatloads of demographic data, and come out with this excellent categorization of each county into 15 types. The types range from “Big Cities” to “African American South” to “College Towns” to “Military Posts”. It’s really quite fascinating.

I believe that state boundaries don’t do a good job at communicating the gradation of commonality across the country. I love projects that use real data to illustrate some of the more natural regions and boundaries. Some projects like this include Facebook’s study of NFL fandom, Dirk Brockmann’s map of dollar bill circulation (which could use some web mapping help), MIT Senseable Lab’s “Connected States of America” by phone and text networks, and NC State statistician Joshua Katz’s dialect maps (although that uses surveys, which I don’t find quite as sexy as the data the others use). I would love to see some sort of aggregation of all these projects.

Harvard longitudinal study: Happiness is love

Harvard studied 268 men beginning in 1938 and followed them for most of their lives to study correlations between various life factors. Some of the interesting results, written about in study director George Valliant’s book Triumphs of Experience and summarized in this article:

  • Alcoholism messes up your life more than just about anything else. It’s correlated with divorce, neurosis, depression, and is second only to smoking as a cause of death.
  • Regarding IQ, once you’re above about 110, it doesn’t make a huge difference to income achievement.
  • Old liberal men have way more sex than anybody: conservatives tend to shut down around 68, liberals keep up their game into their 80s.
  • “Warmth of relationships” is very important to happiness and health later in life. Warm relationships also correlated with higher income achievement and professional success.
  • Strong childhood relationships with one’s parents (especially one’s mother) correlate highly to all sorts of positive things later in life.

Y’hear that? Spread the love, and don’t drink too much!

Boston Mayoral Election Maps

mayor_turnoutBoston Magazine has some excellent maps about the recent mayoral election. They’re all dot maps showing distribution by color. It starts simply with Walsh vs. Connolly, and follows with voter turnout, votes gained by Walsh over the preliminary election, and votes gained by Connolly. I was pointed to this by the excellent blog Bostonography.