Thursday, May 13, 2010

com.atlassian.bamboo.repository.RepositoryException : Failed to checkout source code :(

While needing to do some stuck-build-cleanup the other week, I did an across-the-board upgrade of our core build & source control servers and clients.
We were scattered between a few versions of Atlassian Bamboo, TortoiseSVN, the CollabNet Subversion Command Line Client (for Windows), VisualSVN Server and VisualSVN Visual Studio Client, though all were Svn 1.6.x compatible and I thought it was a good time to get us brought up to a common, consistent & recent release level since I was going to be in the guts of a few of our servers anways.

With the Subversion project still working on the major 1.7 release, 1.6.x has pretty much ended at 1.6.11, and all our tools (for a change) have released 'final' versions for 1.6.11. I thought it was a good stopping point (since I was doing maintenance anyways) before 1.7 comes out, which we'll need to let mature for a few break fix cycles before we go to anyways, so I thought this would catch us up enough to tide us over for ~six months at most. (I had the same thought 6 months ago, thinking there was no way the Subversion project would take six more months before 1.7 came out).
I also threw on a point release upgrade of Atlassian Bamboo to catch us up on the last 3 months of minor break fixes, an upgrade which wasn't technically tied to Svn 1.6.11.

So I grabbed new versions of all the tools we use, all compatible for Subversion 1.6.11 -

  • TortoiseSVN 1.68 x86 & 64-bit Windows clients (for the XP & Windows 7 PC's, respectively)
  • VisualSVN 2.01 Client for Visual Studio 2008
  • CollabNet Subversion Windows Client 1.6.11
  • VisualSVN Server 2.1.2 (for Svn 1.6.11)
  • Atlassian Bamboo 2.5.5


I upgraded everything server-side, sent out emails to all the content creators and developers to upgrade their client tools, caught all the server boxes up on Windows Updates and everything got fresh reboots across the board. I disabled Basic Authentication on the Svn server and enabled NTLM Authentication only. Amazingly, it looked like everything still worked, and best of all nobody was getting authentication popups to retype their credentials anymore in TortoiseSVN  ;)

Fast-forward a week, until this morning; our secondary build server, while still showing totally green, is no longer building anything. I only noticed when a developer brought it to my attention that one of the products that is seto to build only on that server hasn't built in the 48 hours after his commit :(
I pick at it and finally figure out it's failing to successfully authenticate with the Svn server when I finally find this error:

Mainline - Product : Error occurred while executing the build for MAINLINE-PRODUCT-184
(com.atlassian.bamboo.repository.RepositoryException : Failed to checkout source code to revision '47527' for https://visualsvn/svn/Company/Mainline/Product)

I've seen similar errors in the past and it's normally a quick fix - somewhere in the maze of credentials that keep us compliant with the auditors, something expired or was cleared. Normally I just on the server in question, manually connect to the Svn repo using TortoiseSVN or CollabNet's client, re-type in credentials or re-accept an SSL cert that somehow got cleared, and done.
This time, everything was working fine already when I manually connected. I cleared all the cached credentials anyway, re-pulled down a small project to get the ball rolling, re-typed in credentials by hand, re-accepted the SSL certs, and re-kicked off a small build in Bamboo that only runs on the secondary server, to no avail. When I do it manually, everything works. When Bamboo does it, nothing works, it fails somewhere around the svn auth.

I decide that the Bamboo remote agent must need upgrading (even though a dusty note on Atlassian's site claims agents upgrade themselves, the local .jar & .exe files for the agent are still dated from 9 months ago, from the original ).
I try to upgrade the agent in place, give up on that when both (?) agents start fighting (!), remove the Bamboo agent completely from the secondary build server, remove it from the Bamboo administrative level completely, and reinstall from scratch on the secondary server. I have to re-setup the builders (msbuild & script), and go into each of the build projects that soley use the secondary server and re-select the build agent.

I tried running one of the failing builds (still green!), and still no joy, same errors as before around svn auth. I verified the remote agent was upgraded to Bamboo 2.5.5, reinstalled it as a windows service, played with the service account for abit, still not working.

I was able to pull a full stack trace out of the Bamboo agent logs, and started googling off bits of the full stack trace. I gleaned enough to go just to the Atlassian wiki/forum and search on bits and pieces of the error message.

Finally, I found this:
Authentication Failure With NTLM Subversion Authentication
http://confluence.atlassian.com/display/BAMKB/Authentication+Failure+With+NTLM+Subversion+Authentication

Grr, grr, grrr. It turns out the Java SVNKit that Bamboo uses is flaky as heck doing NTLM authentication. And if you remember,  I had turned off Basic Authentication and turned on NTLM Authentication 7 days prior, during the upgrade of VisualSVN Server :(
My real irritation is it's only partially failing - our primary build server is fine apparently doing NTLM - it's just our secondary build server that is flaking out.

My steps to rectify this closely follow the article:

  1. Set Basic Authentication to be the primary method of authentication in the Bamboo remote agent by adding this to the agent's config file and restarting the service:
    • wrapper.java.additional.3=-Dsvnkit.http.methods=Basic,Digest,Negotiate,NTLM
  2. Turn on Basic Authentication again in the VisualSVN Server setup and recycle the VisualSVN services.
  3. Clear cached Svn credentials on the secondary build server, and manually do a check-out and commit (a whitespace change) of a simple project using the CollabNet command line client, specifying the username and password of the service account the Bamboo remote agent runs as, to establish Basic Authentication credentials instead of an NTLM token. 
  4. Manually kick off the builds tied to the secondary build server in the Bamboo console. 

Fixed...

Initial upgrade cost: 2 hours.
Later troubleshooting: 4+ hours on my part, + developer time wasted waiting for their build.

Sigh :(

2 comments:

  1. You should have gone with an upgrade to Git.
    Not really...
    No really.

    ReplyDelete
  2. Well, the main reason to upgrade to Git would be you can use Gource natively without converting your repo history!

    http://code.google.com/p/gource/

    ReplyDelete