The Krippendorff Module
Currently, there are two computational backends: If the number of possible responses is bounded, one can construct a coincidence matrix with all observed pairs. Computing the observed disagreement is straightforward in this case, while the expected disagreement can easily be cpmputed from the marginals of the coincidence matrix. For all other cases, a generical computation strategy was implemented which avoids constructing temporary tables entirely, but typically scales much worse.
Detailed API
Krippendorff.krippendorffs_alpha
— FunctionKrippendorff.alpha(...)
krippendorffs_alpha(input; units = rows, metric = :nominal, R = :discrete, silent = false)
Compute the Krippendorff's-α inter-rater reliability measure from the supplied input. The input will be checked to determine how to iterate over it, missing
values will be handled automatically if present and the most efficient algorithm to compute α will be determined heuristically. By default, Tables.jl
tables and table-like inputs are assumed to have columns representing raters and rows representing units. See prepare_iterator
for more information about the input requirements.
Arguments
units::Union{Symbol,AbstractString}
:rows
orcol(umn)s
, seeprepare_iterator
for explanationmetric
: a metric computing the squared distance between any pair of responses. Any of [:nominal, :interval] or a custom function. Should satisfyf(x,y) = [0 if x==y], [>0 otherwise]
but this is not enforced. See README for explanation of the default metrics.R
: The space of possible responses. Either:discrete
(relatively few possible responses, uses fast computation via coincidence matrix),:continuous
(many possible responses up to continuous range, uses a slower but generically applicable algorithm with minimal allocation) or a precomputed value (implies discrete, but avoids searching for all unique values). If precomputed, it should be supplied as a tuple or vector of possible values (e.g. the output of unique(yourdata)), or as an appropriate range object where possible (slightly more efficient).silent::Bool
: set to disable all optional output (@info
and@warn
, doesn't affect error messages)
See also: compute_alpha_generical
, compute_alpha_with_coincidences
Krippendorff.compute_alpha_with_coincidences
— Functioncompute_alpha_with_coincidences(units, metric, possible_responses)
The default fast computation backend for Krippendorffs alpha. This will iterate over all units only once and thus scales preferably if the number of differing possible responses is bounded. A coincidence matrix is generated to keep track of the observed disagreement, while the expected disagreement will be computed from the marginals of the coincidence matrix. Be careful, if the number of possible responses is large, this backend may allocate a lot!
Arguments
units_iterator
: An iterable object which is assumed to yield units with nomissings
. If this is not the case, you can callprepare_iterator
on it.squared_distance_metric
: An object callable on any pair of possible responses in the supplied iterator. Should satisfyf(x,y) = [0 if x==y], [>0 otherwise]
but this is not checked explicitly.R
: Space of possible responses. This is necessary to generate the coincidence matrix efficiently. It should be supplied as a tuple or vector of possible values (e.g. the output of unique(yourdata)), or as an appropriate range object where possible.
Krippendorff.compute_alpha_generical
— Functioncompute_alpha_generical(units_iterator, squared_distance_metric)
Generic computation backend for Krippendorffs alpha, bypassing the creation of coincidence tables entirely. The tradeof in this case is the necessity to iterate over all pairs of units. Thus, scaling is typically much worse (in O(U²*R) for (U)nits and (R)aters) than when using coicidence matrices. Nonetheless, this backend will be used by default if the number of possible responses is large compared to the size of the input, since a coincidence matrix can get huge in this case. It is also preferable when all possible responses are not known beforehand or span a continuous spectrum of values.
Arguments
units_iterator
: An iterable object which is assumed to yield units with nomissings
. If this is not the case, you can callprepare_iterator
on it.squared_distance_metric
: An object callable on any pair of possible responses in the supplied iterator. Should satisfyf(x,y) = [0 if x==y], [>0 otherwise]
but this is not checked explicitly.
Krippendorff.prepare_iterator
— Functionprepare_iterator(input; units = rows)
Prepare an object for iteration by one of the compute_alpha_...
functions. This involves determining how to iterate over units in the object and probing for the elementtype. If appropriate, the units
argument is used to determine the direction of iteration. This is however not always possible. If no sense of direction is found, the heuristic will assume the input is already a suitable iterator over units. Furthermore, if the input is found to contain missing
values (or has them in it's eltype), all units will be wrapped in skipmissing
automatically.
Since some seemingly unstructured iterables can satisfy the Tables.jl
interface somewhat surprisingly (Dict{Symbol,Vector} does, but not Dict{String,Vector} for example) and this may change the order of iteration implied, you can call the helper function Krippendorff.istable
to see whether your input looks like a table
and if yes, how many rows and columns it appears to have.
Arguments
input
: The input to be prepared. Should support generic iteration viaiterate
,eachrow
oreachcol
or satisfy theTables.jl
interface as determined byTables.istable
.units::Union{Symbol,AbstractString}
: eitherrows
orcol(umn)s
. This is used to determine how to iterate units in the input. For example, if the input iterator was aMatrix
,:rows
would make the function calleachrow(input)
(and a little bit more).
Krippendorff.istable
— Functionistable(input; IO = stdout)
A thin wrapper around Tables.istable
that additionally prints how man rows and columns the input appears to have when iterated through the Tables.jl
interface. (Tables.columns
specifically) IO
can be used to redirect the written output. Pass IO=devnull
to supress output (making it equivalent to calling Tables.istable
Examples
The Tables.jl
interface assumes named columns and unnamed rows, which may lead to confusion if one wanted to pass a dictionary of rows for examples:
julia> testmatrix = reshape(1:15, (3,5))
3×5 reshape(::UnitRange{Int64}, 3, 5) with eltype Int64:
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
julia> Krippendorff.istable(testmatrix)
false
julia> Krippendorff.istable(Tables.table(testmatrix));
Input satisfies the Tables.jl table interface and appears to have 3 rows and 5 columns.
julia> testvectordict = Dict([k=>v for (k,v) in zip([:row1, :row2, :row3], eachrow(testmatrix))]); [println(entry) for entry in testvectordict];
:row1 => [1, 4, 7, 10, 13]
:row2 => [2, 5, 8, 11, 14]
:row3 => [3, 6, 9, 12, 15]
julia> Krippendorff.istable(testvectordict)
Input satisfies the Tables.jl table interface and appears to have 5 rows and 3 columns.
true