Search instructions

It is possible to search within a word-level segment. You can search for the orthographic word form (e.g. "midagi"), pronunciation that is transcribed in SAMPA (e.g. "mit_vAk_vi"), or a word structure (e.g. "CVCVCV").

The sequence in the search field (Otsisõna) is treated as a regular expression, so you have to describe the whole field you are looking for. So, for example if you want to find all the words that end with -lik, you have to search for ".*lik". Below are listed some elements of the syntax:

  • . (period) stands for any symbol. E.g. the search ".a" matches with "ma", "sa", "ta", "ka" etc.
  • ? (question mark) stands for 0 or 1 repetitions of the previous symbol. E.g. "tal?" matches with "ta" and "tal", but not "tall".
  • + (pluss) stands for one or more repetitions of the previous symbol. E.g. "ma+" matches with "ma" and "maa", but not "m".
  • * (tärn) stands for 0 or more repetitions of the previous symbol. E.g. "sam*" matches with "sa" and "samm".
  • {x}  stands for x repetitions of the previous symbol.
  • {x,y} stands for x to y repetitions of the previous symbol.
  • [] one of the symbols in the brackets may occur. E.g. "[vk]ana" matches with "vana" and "kana".
  • \ (backslash) treats the following symbol as a symbol, if the same symbol is used also as a syntax element. This has to be used when searching for compound words (e.g. "keele\+teadus") or { from SAMPA transcription (that stands for /ä/; e.g. "t\{nt_vAp").

The results of a search from the corpus are given as a sequences of 2 seconds. The sequences are displayed in a table that imitates the segmentation on a Praat's TextGrid. Different segmentation levels are placed on different rows and the length of each segment corresponds to its duration. If the content of a segment does not fit in its cell you can see it by stopping the mouse on that interval. It is also possible to download the search results as wav and TextGrid files.

Last edited: 2010-11-23 13:32:35