How the Guardian uses GitHub to audit GitHub

How the Guardian wrote gu:who : a tool to help manage GitHub organisation membership

Roberto Tyley

Published on Friday, 11 April 2014

Computing   Data and computer security   Git   Software  

Guess Who? Demystifying GitHub organisation membership
Guess Who? Demystifying GitHub organisation membership Photograph: Bethany Khan (https://www.flickr.com/photos/bethanykhan/4466733616)/Flickr

We’ve nearly 200 people in our four year old GitHub organisation: employees, contractors, third party suppliers, continuous integration bots, and… and… and people are starting to wonder:

More than most, the Guardian is aware of what happens when you allow a large number of people access to your private data – only the people who legitimately need access should have it, but managing that in an organisation which is large, busy and distributed is a major hassle. Many accounts are completely unidentifiable – just a short, cryptic username with no avatar or other distinguishing features – so establishing who’s even responsible for clearing them up is just impossible.

We’d like to enforce some order. But we’re devs. So we’d rather not have spreadsheets-of-crap, Windows Active Directory, LDAP or, if truth be told, anything we have to actively remember to think about, because our jobs are actually about delivering business value on much more interesting projects.

Consequently, let’s build a bot to audit the membership of our GitHub organisation. But let’s do away with a backing store, because with a little thought GitHub can be our backing store - and let’s build only the most minimal interface, because GitHub can be our interface too. This means people using our bot need to learn absolutely nothing to use it – they’re devs, and they already know how to use these tools – they already know how to use GitHub.

gu:who?

gu:who is the bot we built, open source under the Apache V2 licence, and this is how it works:

Using GitHub issues makes it easy for everyone to understand what’s going on, and for people to see who amongst their colleagues might need to be given a nudge to move them along.

Issues produced by gu:who
Issues produced by gu:who. Photograph: Roberto Tyley/The Guardian Photograph: Roberto Tyley/The Guardian

These are the simple security requirements gu:who enforces on each account in order to help make your code more secure:

That last one is interesting because of the way it’s expressed. The senior member of staff adds the user to a users.txt file in a dedicated GitHub repo, taking responsibility via git-blame for the user being in the organisation. This ensures there’s always someone to go to when membership for a dodgy account is in doubt.

The dedicated GitHub repo is called, by convention, ‘people’, and is also the repository where gu:who raises issues against users – all users in the organisation are able to see it, but we’d recommend you ensure it’s a private repo, as it will contain a full list of all the security vulnerable users in your organisation.

How does this compare to what GitHub’s already got?

For “Owners” of GitHub organisations, GitHub’s team-based tools are pretty good, and discretely display the 2FA status of users. They don’t have any way of enforcing that users must maintain their profile to a certain standard (eg, the full name, or defined avatar), and as far as ‘where-did-that-user-come-from’, the Organisation “Security History” page gives a one month history that probably won’t extend far enough back to understand where problematic users came from:

GitHub's security page for organisations
GitHub’s security page for organisations Photograph: Roberto Tyley/Guardian Photograph: Roberto Tyley/Guardian

These tools don’t really do anything to encourage people to solve their own problems, which gu:who is quite good at doing.

Clearly, it could be difficult for GitHub to implement Organisation rules that embody exactly what every different organisation in the world needs, but it looks like there’s probably room for improvement.

Remaining questions for gu:who

Do you bother with two-factor-auth for CI bots? Who would hold responsibility for the phone required to authenticate these accounts over a long time period? In the end, we decided that for members of the “bots” team, we would just waive that requirement.

There’s also the problem of attempting to automatically detect when someone has left the company. Could we use periodic sponsor review? Perhaps whitelist people who’ve recently had a pull request accepted?

Try it yourself

You can try gu:who out on your own GitHub organisation: grab the source code, install sbt and run gu:who yourself:

$ git clone https://github.com/guardian/gu-who.git
$ cd gu-who
$ sbt start
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
[info] Loading project definition from /home/roberto/development/gu-who/project
[info] Set current project to gu-who (in build file:/home/roberto/development/gu-who/)
(Starting server. Type Ctrl+D to exit logs, the server will remain in background)
Play server process ID is 17861
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000

It’s still a little rough around the edges, and given that the code is covered by the Apache V2 licence this statement very much applies:

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

…but we’d love you to try it out and give us feedback. In any case, for us:

The results

The gu:who bot was created by Roberto Tyley and Lindsey Dew at The Guardian. If you’re interested in Git and Security you may also be interested in The BFG, a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository - ie passwords, credentials & other private or unwanted data.

Continue reading

New mix of stories on mobile homepage Improving Sass code quality on theguardian.com