For last few days I was working on EGit Synchronize View performance, especially on Workspace presentation model (the Git Change Set is next on my list ;)). My starting point was 1m 40s to compare two linux kernel versions, v2.6.36 versus v2.6.38-rc2. Such result seams to be very good, but you need to know that it was achived on SSD hard drive and comparing to regular HDD it would be much worst (maybe more then 3 or 4 minutes).
What can be improved ? Fist of all, current implementation whenever it is asked for members of particular folder it opens repository* and read data directly from it. Of course there is a cashing mechanism, but it is only on single resource level. Therefore when you are launching synchronization on repository that have about 300 folders, current implementation will create and configure 300 connections* to repository to read data and then will cache it.
So, my idea was to create only one connection*, read all data at once and put them into global cache. This cache will be used whenever any list of members for given folder will be required. This approach gives about 2.5x performance boost to synchronization (from 1m 40s down to 40s). This result looks much better and maybe on HDD this action will take less then 2 minutes … but this isn’t over
Reading members of folder is one thing, but getting information about particular file (it is changed, added or removed and does this change is incoming, outgoing or conflicting) is another. Currently we are reusing default implementation of SyncInfo class from Team Framework. This is really good implementation … when you cannot obtain such information from version control system. In Git we have SHA-1 for each file and folder version and we didn’t have to compare file contents to check they are similar or not, comparing SHA-1 is sufficient. This should save lots of CPU time, disk IO’s and developer time waiting for synchronization to finish ;).
Now when I already have cache that contains list of all changed resources it was natural thing to add information about change type to it. Then whenever Team Framework need to know change type it can be easily obtained from this cache … no IO’s are needed, no comparison just read from in-memory-cache and return proper value.
I’m sure that you are wondering how fast synchronization can be now … I can only that it is REALLY fast … as you can remember my stating point was 1m 40s, now same comparison will finish in less than 7s!! This means that now synchronization will be 14 times faster then before! What this means for a regular user? Well, it meas that you will get results of ‘Synchronize Workspace‘ action almost instantly.
Unfortunately, mentioned above changes are sill awaiting for review in gerrit, you can grab them from change #3891 and build it locally. I hope this will be included in 1.1 release …
* jgit uses concept of walks (with filters) through repository, but I’ve used more commonly recognized terminology here
Pingback: Dariusz Luksza: Huge performance boost for EGit sync-view
That’s great news, this should make a lot of people happy
Hi Darek,
Can you let me know how much time does it take this operation in pure git?
@Chris Aniszczyk
Yes, in deed; especially gays from CDT should be very happy;)
@Krzysztof
To be honest I don’t see any equivalent command to sync-view operation in native git, because it don’t produce raw diff data like `git diff`; it also splits those diffs into nice tree structure. Of course we can compare sync-view with `git diff` and on my machine `git diff HEAD..v2.6.38-rc2` (where current HEAD is pointing on v2.6.36) will open an vim in less then 1s, but it will need additional 1s to scroll down to line last line (I’ve used ctl+G shortcut) of pseudo file that have exactly 3314381 lines.
If you’ll consider that in EGit we are doing it in pure java implementation, need to struggle with synchronization framework that was designed for CVS and present all results in nice graphical UI the additional 5 or 6s doesn’t make a huge difference … comparing it to native C Git compiled with specific CFLAGS on my gentoo linux …
So, to sum it all up, comparing pure C, command line implementations with pure java UI doesn’t make sense .. IMHO