Quick tutorial

In this example, we will use some sample text modified from the public domain text Aristotle's History of Animals http://www.gutenberg.org/files/59058/59058-0.txt.

julia> using KeywordSearch, Random

julia> text_with_typos = Document("""
           Some animals have fet, others have noone; of the former some have
           two feet, as mankind and birdsonly; others have four, as the lizard
           and the dog; others, as the scolopendra and bee, have many feet; but
           all have their feet in pairs.
           """)
Document starting with " Some animals have…". Metadata: NamedTuple()

julia> fuzzy_query = FuzzyQuery("birds only")
FuzzyQuery{DamerauLevenshtein{Nothing},Int64}("birds only", DamerauLevenshtein{Nothing}(nothing), 2)

julia> m = match(fuzzy_query, text_with_typos)
QueryMatch with distance 2 at indices 92:101.

julia> explain(m)
The query "birds only" matched the text "…former some have two feet, as mankind and birdsonly; others have four, as the lizard and the…" with distance 2.

Here, you'll notice an exact query does not match, since the words "birds" and "only" have been conjoined:

julia> exact_query = Query("birds only")
Query("birds only")

julia> match(exact_query, text_with_typos) # nothing, no exact match

KeywordSearch offers the augment function specifically to address mis-conjoined words:

julia> augmented_query = augment(exact_query)
Or
├─ Query("birds only")
└─ Query("birdsonly")

julia> m2 = match(augmented_query, text_with_typos) # now it matches
QueryMatch with distance 0 at indices 93:101.

julia> m2.query # which of the two queries in the `Or` matched?
Query("birdsonly")

Here, augment generated an Or query, but we can generate one ourselves:

julia> dog_or_cat = Query("dog") | Query("cat")
Or
├─ Query("dog")
└─ Query("cat")

julia> m3 = match(dog_or_cat, text_with_typos)
QueryMatch with distance 0 at indices 144:146.

julia> explain(m3)
The query "dog" exactly matched the text "…others have four, as the lizard and the dog; others, as the scolopendra and bee, have…".

Note also that FuzzyQuery by default uses the DamerauLevenshtein() distance from StringDistances.jl, and searches for a match within a cutoff of 2 but you can pass it another distance or use another cutoff:

julia> fuzzy_query_2 = FuzzyQuery("brid nly", DamerauLevenshtein(), 4)
FuzzyQuery{DamerauLevenshtein{Nothing},Int64}("brid nly", DamerauLevenshtein{Nothing}(nothing), 4)

julia> m4 = match(fuzzy_query_2, text_with_typos)
QueryMatch with distance 4 at indices 93:100.

julia> explain(m4)
The query "brid nly" matched the text "…former some have two feet, as mankind and birdsonly; others have four, as the lizard and the…" with distance 4.