

For example, use “black shoes” to get results where the words ‘black’ and ‘shoes’ appear together, eg ‘black shoes for sale’. Quotation marksĪlthough not strictly a Boolean operator, quotation marks can be used to get more accurate results. Some search engines use a minus sign in front of the word instead of NOT, eg -brown. The query ‘shoes NOT brown’ will return results that contain the word shoes but NOT the word brown. NOT tells a search engine what to ignore. Californias earliest Black settlers bought land only for it to be stolen. Most search engines would interpret this as ‘black OR white AND shoes’. And combines search terms so that each search result contains all of the terms. You can use these operators to create a very broad or very narrow search. The Boolean search operators are and, or and not. Use OR to request an alternative, for example ‘black OR white shoes’. Knowledge Boolean logic defines logical relationships between terms in a search. This won’t guarantee that the words will appear next to each other, only that both words will be present in results, eg ‘black T-shirts and purple shoes for sale’.

Sometimes you have to add AND to get results that contain both words. In general, search engines treat the query ‘black shoes’ as ‘black AND shoes’ - which means results must contain both words, eg ‘black shiny shoes for sale’. You may get results that contain only one of the two words, eg ‘purple shoes for sale’ or ‘black T-shirts for sale’. The words 'black' and 'shoes' will return results that contain the words 'black' and/or 'shoes'. They can be used to get more accurate search results. AND is intersection, OR is union.The most common Boolean operators are AND, OR and NOT (always in capitals). Now, you simply do set oporations on bunch of sets. Note that this model actually allows AND semantics by setting alpha=1, and OR semantics by alpha!=1.īoolean search is basically set terminology:Įach term is associated with a set that contains all the documents that have this term in them. Now - when we have a query of more than one term: q=t1 t2. a common technique is: P(word|document) = alpha*#occurances(word,document)/|document| + (1-alpha)*#occurances(word,corpus)/|corpus| To avoid having probability zero - we usually add smoothing technique. Other models are building a language model out of a document - a language model is described as P(word|M) = the probability of the model M to generate the word.Ī common language model is P(word|document) = #occurances(word,document)/|document| The first is using a boolean model to get candidates, and the 2nd is using vector-space to get a score for each document. In addition, this method has an important advantage - it returns a score associated to each document, and not only a boolean answer "relevant" or "not relevant".Īs-is vector-space does not allow AND,OR oporations, however this is easily solveable by doing 2-phase search. You do not oporate on 'sets', you compare similarity of vectors - this is entirely different model.

You can use a maximum of nine operators per input field, and a total of nineteen. This model goes well with the tf-idf model (The td-idf determines what is the value in each entry of each vector). You can combine your search terms using the Boolean operators AND, OR and NOT. The more similar the document is to the query - the better the result is.Ī common similarity measure is cosine-similarity. The similarity in this model is done by creating a 'fake' document - which is the query, and comparing this fake document to any other document in the corpus. The dimension of each document is the number of terms in the vocabulary. In this model, each document is a vector, represented by the words (or bi-grams.) it contains. Probably the most common example of such a method is the vector-space model. Non boolean search includes approaches that are not purely boolean model techniques 1.
