End of Google Summer of Code 2011

Officially coding period in Google Summer of Code was closed last Monday (August 22nd) , so this is good time to sum up last three months of my work on EGit project.

During that time I’ve manage to contribute 32 commits (29 of them are already merged into master, rest is pending for review). I have also two not finished changes in my local git repository. First is about supporting non-workspace files in Workspace presentation model in sync-view, second is refactoring of current Git Change Set implementation.

Here is detailed list of things that was done during GSoC and are available in current nightly build:

  • improved synchronize wizard
  • Workspace presentation model will refresh after repository change
  • pushing from sync view to multiple repositories is now possible
  • Git Change Set presentation model will be refreshed after workspace change
  • context menu in synchronize view was cleaned up and git actions are shown for multiple selections
  • synchronization on folder level is now possible and it will narrow results just for selected folder (when synchronization is launched for project then results will show changes in whole repository)
  • local changes can be drag and dropped between <working tree> and <staged changes> nodes in Git Change Set presentation model

Performance improvements for Workspace presentation model are still awaiting for review in Gerrit. Some part of performance and memory usage improvements for Git Change Set are also pending in Gerri, I’m currently working on rest of required refactoring in Git Change Set presentation model.

I want to thank my mentor Matthias Sohn for his commitment in GSoC, for reviewing my patches and his feedback. It was a great pleasure working  with him! 😉

Huge performance boost for EGit sync-view

For last few days I was working on EGit Synchronize View performance, especially on Workspace presentation model (the Git Change Set is next on my list ;)). My starting point was 1m 40s to compare two linux kernel versions, v2.6.36 versus v2.6.38-rc2. Such result seams to be very good, but you need to know that it was achived on SSD hard drive and comparing to regular HDD it would be much worst (maybe more then 3 or 4 minutes).

What can be improved ? Fist of all, current implementation whenever it is asked for members of particular folder it opens repository* and read data directly from it. Of course there is a cashing mechanism, but it is only on single resource level. Therefore when you are launching synchronization on repository that have about 300 folders, current implementation will create and configure 300 connections* to repository to read data and then will cache it.

So, my idea was to create only one connection*, read all data at once and put them into global cache. This cache will be used whenever any list of members for given folder will be required. This approach gives about 2.5x performance boost to synchronization (from 1m 40s down to 40s). This result looks much better and maybe on HDD this action will take less then 2 minutes … but this isn’t over 😉

Reading members of folder is one thing, but getting information about particular file (it is changed, added or removed and does this change is incoming, outgoing or conflicting) is another. Currently we are reusing default implementation of SyncInfo class from Team Framework. This is really good implementation … when you cannot obtain such information from version control system. In Git  we have SHA-1 for each file and folder version and we didn’t have to compare file contents to check they are similar or not, comparing SHA-1 is sufficient. This should save lots of CPU time, disk IO’s and developer time waiting for synchronization to finish ;).

Now when I already have cache that contains list of all changed resources it was natural thing to add information about change type to it. Then whenever Team Framework need to know change type it can be easily obtained from this cache … no IO’s are needed, no comparison just read from in-memory-cache and return proper value.

I’m sure that you are wondering how fast synchronization can be now … I can only that it is REALLY fast … as you can remember my stating point was 1m 40s, now same comparison will finish in less than 7s!! This means that now synchronization will be 14 times faster then before! What this means for a regular user? Well, it meas that you will get results of ‘Synchronize Workspace‘ action almost instantly.

Unfortunately, mentioned above changes are sill awaiting for review in gerrit, you can grab them from change #3891 and build it locally. I hope this will be included in 1.1 release …

* jgit uses concept of walks (with filters) through repository, but I’ve used more commonly recognized terminology here

Google announces list of accepted projects in GSoC11

Google Summer of Code 2011

Almost 24 hours ago Google announces list of accepted projects in this year edition of Google Summer of Code program. In current edition Eclipse Foundation got 17 slots (as you may know from Wayne post). One of this slots was allocated for me (as a student) and Matthias Sohn, the project that we’ll be working is a continuation of my last year work for EGit project. The project name is “EGit Synchronize View support part 2”, if somebody is curious what this project is about you can check it on melange’s (I’ve made it public). In this proposal I want to address most important missing features in current implementation of EGit Synchronization support. If you think that something there is missing pleas let me know via mail or comments!

Apart from that I’m third time GSoC student (and that Eclipse got 17 slots!) there is another good news for me … on the list of accepted proposals in Eclipse Foundation there is 6 (six!) student’s names that looks like there are from Poland! Yeah! 1/3 of Eclipse GSoC students seams to be from Poland, this is really great information! Next great thing is that four projects are mentored by Polish mentors! Great work guys! And good luck 😉

And last but not least, as far as I know there is one other student from my university that was accepted in GSoC11. This is some kind of regression comparing to 2010 edition where there were three accepted students. Maybe in next edition this will be improved, we’ll see.

After announcement the “community bonding” time begins … but not for me since I’m quite well integrated with EGit community 😉 therefore for me the “coding period” starts today ;>