Since its introduction a decade ago the h-index has rapidly become the most frequently used measure of research productivity and citation impact amongst scientists. It’s far from perfect and has been criticised from a number of perspectives, particularly when used as a blunt tool for assessing a scientist’s “quality”. Nonetheless it’s a useful measure that allows some comparison within research fields and (I think more importantly) gives individuals one method, amongst any number, of assessing the influence their work is having on their discipline.
Put simply, an individual’s h-index is calculated by ranking their publications by number of citations; the point at which the rank position of a publication is at least equal to the number of citations for that publication is the h-index. For example, if a scientist has 18 papers all with at least 18 citations, their h-index is 18. As soon as another publication reaches 19 citations, their h-index will go up to 19, and so forth.
That’s an important point about the h-index (and indeed all other measures of success/impact/whatever) – they are not static and they change over time. As the Wikipedia entry that I linked to above notes, the originator of the index, Jorge Hirsch, suggested that 20 years after their first publication the h-index of a “successful scientist” will be 20; that of an “outstanding scientist” would be 40; and a “truly unique” scientist would have an h-index of 60. However, this will vary between different fields, so any comparisons are best done within a discipline.
One question that I’ve not seen widely discussed is how an individual’s h-index changes over time (though see Alex Bateman’s old blog post about “Why I love the h-index“, where he refers to the “h-trajectory”). Does the “successful scientist” typically accrue those 20 h-index points regularly, 1 point per year, over the 20 years? Or are there years when the h-index remains static and others when it increases by more than the average of 1 point per year? If the latter, what’s the largest annual leap in an h-index that one could reasonably expect? Finally, if we were to plot up the h-index over time, what shape curve can we expect from the graph?
On one level these are purely academic questions, the result of some musing and window gazing during a bus ride between campuses a couple of weeks ago. But there’s also a practical aspect to it, if scientists wish to track this measure of their career progression. For an early career scientist starting out with their first few publications, it’s easy to record their h-index as it changes over time from this point forward. But what about a mid- or late-career scientist who started publishing long before the h-index was even thought of? How do they reconstruct the way in which their h-index has evolved over time, should they be so inclined?
As far as I know there’s no simple, automatic way to do it (but please correct me if I’m wrong). Indexing and citation systems such as Web of Science and Google Scholar give the current h-index and no indication of past history, you have to work it out for yourself. Which is what I’ve done, and the procedure below is (I think) the most straightforward* way of reconstructing the evolution of an h-index.
So, pour yourself a cup of coffee** and settle in for a bit of academic archaeology.
I’m going to demonstrate the process using Web of Science (WoS)***, but it should be identical in overall procedure, if not in detail, in Google Scholar, Scopus, etc. However be aware that Google Scholar is much less conservative in what it counts as a citation, hence h-indexes from that source are typically significantly higher than from others.
The first thing to do after you’ve logged on to WoS is to perform a Basic Search by author name, across all years; I’ve done this for All Databases as some of my**** publications (specifically peer-reviewed book chapters) are not listed in the WoS Core Collection database (the default selection):
Perform the search then select Create Citation Report. This will return a pair of graphs showing number of publications per year and number of citations per year, plus a table with some metrics about average citations per year, etc., and a value for the current h-index of that author:
Below that is a list of publications for Ollerton, J ranked by number of times cited:
As you can see, WoS indicates that the h-index of Ollerton, J is 23. That’s incorrect, it’s actually 22 (i.e. a not-quite-successful scientist) because despite having a relatively uncommon name, there are other people called Ollerton, J who publish (including my cousin Janice). However it’s a simple matter to remove any publications that are not your own using the check boxes against each publication and the “Go” button. Ignore any publications that are ranked lower than your h-index.
Once you have a clean list, use the drop-down menu underneath the page to save your list as either a text or Excel file; again, just save the publications that are contributing to your h-index by choosing the number of records that corresponds to your h-index [UPDATE: however see Vera van Noort’s comment below about the possible influence of early publications that were only cited once or twice on your early h-index. UPDATE x 2: see also the later comments by Alex Bateman and Vera – later publications can drop out of the h-index list too. This wasn’t an issue for my set of publications, but it’s worth checking if you’re following this procedure].
The Excel***** file is easiest to work with: it provides you with the two graphs shown on the WoS citation report plus details of the publications, average citations and so forth, and all the raw data on number of citations per year back to 1950 (click on each image for a larger view):
To make the spreadsheet easier to work with I advise deleting all the stuff you don’t need, including the figures and the columns from 1950 up to the date of your first publication.
You now have to calculate cumulative number of citations over time for each publication using the Sum function (I’ll not go into details, should be straightforward if you know your way around Excel).
Next, copy all of the data and paste-special onto another sheet, selecting “values” (to just paste the data, not the formulae) and “transpose” (to turn the data 90 degrees) from the paste-special options. Remove the original data to just leave the cumulative citations and then select all of the data and use the Custom Sort function to order the rows by by date of publication:
Now it’s a matter of going along the columns and recording the number of publications that exceed the h-index for the previous column; I’ve colour-coded this below to make it easier to see:
Finally, graph up the data:
The results are interesting (or at least I think so). In relation to the questions I posed above its clear that there are periods when the h-index doesn’t increase for a couple of years; more periods when the h-index increases by one each year; and a couple of years when the h-index increases by 2 points. But that’s the maximum and I suspect that increasing by 3 or more index points in a year would be very unusual indeed (though see my second point below).
Although there’s a clear “lag phase” in the first five years when the h-index hardly changes, there are also periods when there’s no increase in h-index much later, e.g. 2013/14, so this stasis is not restricted to the beginning of my career.
Some final points:
1. Make sure your citation data on Web of Science is accurate. I have found LOTS of mis-citations of my publications over the years, by authors who include incorrect dates, volume numbers, page numbers, even authors, in the references they cite. WoS has a facility for correcting these mis-citations, but you have to let them know, it’s not automatic.
2. How representative are my results for the population of ecologists or scientists more generally? I have no idea but I hope others go through the same procedure so that we can begin to build up a picture of how the h-index evolves.
*No doubt this could be automated in some way and perhaps this will stimulate some competent programmer or app developer to do so, but doing it by hand is so straightforward that I’m not sure it’s worth the effort of constructing a working system. Certainly the Excel part of the procedure could be done more elegantly in R.
**Other beverages are available.
***Other indexing and citation systems are available.
****Other scientists are available 🙂 But it doesn’t seem fair to use someone else as an example. In any case, consider this another post reflecting on my life and career in my 50th year on this planet!
*****Other spreadsheets are available. That’s the last one, promise.