The Krippendorff Module

Currently, there are two computational backends: If the number of possible responses is bounded, one can construct a coincidence matrix with all observed pairs. Computing the observed disagreement is straightforward in this case, while the expected disagreement can easily be cpmputed from the marginals of the coincidence matrix. For all other cases, a generical computation strategy was implemented which avoids constructing temporary tables entirely, but typically scales much worse.

Detailed API

Krippendorff.krippendorffs_alphaFunction
Krippendorff.alpha(...)
krippendorffs_alpha(input; units = rows, metric = :nominal, R = :discrete, silent = false)

Compute the Krippendorff's-α inter-rater reliability measure from the supplied input. The input will be checked to determine how to iterate over it, missing values will be handled automatically if present and the most efficient algorithm to compute α will be determined heuristically. By default, Tables.jl tables and table-like inputs are assumed to have columns representing raters and rows representing units. See prepare_iterator for more information about the input requirements.

Arguments

  • units::Union{Symbol,AbstractString}: rows or col(umn)s, see prepare_iterator for explanation
  • metric: a metric computing the squared distance between any pair of responses. Any of [:nominal, :interval] or a custom function. Should satisfy f(x,y) = [0 if x==y], [>0 otherwise] but this is not enforced. See README for explanation of the default metrics.
  • R: The space of possible responses. Either :discrete (relatively few possible responses, uses fast computation via coincidence matrix), :continuous (many possible responses up to continuous range, uses a slower but generically applicable algorithm with minimal allocation) or a precomputed value (implies discrete, but avoids searching for all unique values). If precomputed, it should be supplied as a tuple or vector of possible values (e.g. the output of unique(yourdata)), or as an appropriate range object where possible (slightly more efficient).
  • silent::Bool: set to disable all optional output (@info and @warn, doesn't affect error messages)

See also: compute_alpha_generical, compute_alpha_with_coincidences

source
Krippendorff.compute_alpha_with_coincidencesFunction
compute_alpha_with_coincidences(units, metric, possible_responses)

The default fast computation backend for Krippendorffs alpha. This will iterate over all units only once and thus scales preferably if the number of differing possible responses is bounded. A coincidence matrix is generated to keep track of the observed disagreement, while the expected disagreement will be computed from the marginals of the coincidence matrix. Be careful, if the number of possible responses is large, this backend may allocate a lot!

Arguments

  • units_iterator: An iterable object which is assumed to yield units with no missings. If this is not the case, you can call prepare_iterator on it.
  • squared_distance_metric: An object callable on any pair of possible responses in the supplied iterator. Should satisfy f(x,y) = [0 if x==y], [>0 otherwise] but this is not checked explicitly.
  • R: Space of possible responses. This is necessary to generate the coincidence matrix efficiently. It should be supplied as a tuple or vector of possible values (e.g. the output of unique(yourdata)), or as an appropriate range object where possible.
source
Krippendorff.compute_alpha_genericalFunction
compute_alpha_generical(units_iterator, squared_distance_metric)

Generic computation backend for Krippendorffs alpha, bypassing the creation of coincidence tables entirely. The tradeof in this case is the necessity to iterate over all pairs of units. Thus, scaling is typically much worse (in O(U²*R) for (U)nits and (R)aters) than when using coicidence matrices. Nonetheless, this backend will be used by default if the number of possible responses is large compared to the size of the input, since a coincidence matrix can get huge in this case. It is also preferable when all possible responses are not known beforehand or span a continuous spectrum of values.

Arguments

  • units_iterator: An iterable object which is assumed to yield units with no missings. If this is not the case, you can call prepare_iterator on it.
  • squared_distance_metric: An object callable on any pair of possible responses in the supplied iterator. Should satisfy f(x,y) = [0 if x==y], [>0 otherwise] but this is not checked explicitly.
source
Krippendorff.prepare_iteratorFunction
prepare_iterator(input; units = rows)

Prepare an object for iteration by one of the compute_alpha_... functions. This involves determining how to iterate over units in the object and probing for the elementtype. If appropriate, the units argument is used to determine the direction of iteration. This is however not always possible. If no sense of direction is found, the heuristic will assume the input is already a suitable iterator over units. Furthermore, if the input is found to contain missing values (or has them in it's eltype), all units will be wrapped in skipmissing automatically.

Since some seemingly unstructured iterables can satisfy the Tables.jl interface somewhat surprisingly (Dict{Symbol,Vector} does, but not Dict{String,Vector} for example) and this may change the order of iteration implied, you can call the helper function Krippendorff.istable to see whether your input looks like a table and if yes, how many rows and columns it appears to have.

Arguments

  • input: The input to be prepared. Should support generic iteration via iterate, eachrow or eachcol or satisfy the Tables.jl interface as determined by Tables.istable.
  • units::Union{Symbol,AbstractString}: either rows or col(umn)s. This is used to determine how to iterate units in the input. For example, if the input iterator was a Matrix, :rows would make the function call eachrow(input) (and a little bit more).
source
Krippendorff.istableFunction
istable(input; IO = stdout)

A thin wrapper around Tables.istable that additionally prints how man rows and columns the input appears to have when iterated through the Tables.jl interface. (Tables.columns specifically) IO can be used to redirect the written output. Pass IO=devnull to supress output (making it equivalent to calling Tables.istable

Examples

The Tables.jl interface assumes named columns and unnamed rows, which may lead to confusion if one wanted to pass a dictionary of rows for examples:

julia> testmatrix = reshape(1:15, (3,5))
3×5 reshape(::UnitRange{Int64}, 3, 5) with eltype Int64:
 1  4  7  10  13
 2  5  8  11  14
 3  6  9  12  15

julia> Krippendorff.istable(testmatrix)
false

julia> Krippendorff.istable(Tables.table(testmatrix));
Input satisfies the Tables.jl table interface and appears to have 3 rows and 5 columns.

julia> testvectordict = Dict([k=>v for (k,v) in zip([:row1, :row2, :row3], eachrow(testmatrix))]); [println(entry) for entry in testvectordict];
:row1 => [1, 4, 7, 10, 13]
:row2 => [2, 5, 8, 11, 14]
:row3 => [3, 6, 9, 12, 15]

julia> Krippendorff.istable(testvectordict)
Input satisfies the Tables.jl table interface and appears to have 5 rows and 3 columns.
true
source