Natural Language Searches

In addition to Boolean searching, some electronic research services allow you to simply enter research questions in “plain English.” The literature characterizes this alternative search technique in various ways including “natural language,” “associative language,” “probabilistic,” “relevance,” and “statistical” searching. Some tax research services have special names for their systems of natural language searching. For instance, LEXIS calls its natural language algorithm Freestyle, while Westlaw dubs its program WIN (Westlaw Is Natural).

Researchers without “Boolean prowess” may formulate ineffective or inefficient Boolean search requests and be dissatisfied at times with their results. A godsend to those never fully embracing Boolean logic, natural language searching sometimes produces surprisingly good results. Thus, the natural language approach can increase the effectiveness and efficiency of researchers who have not mastered Boolean logic.

But researchers long accustomed to “Boolean speak” can benefit from this alternative approach too. Well-constructed Boolean searches occasionally miss relevant documents or pull too many irrelevant documents. Though natural language searches often do not outperform well-constructed Boolean searches, they do in some cases. So, even Boolean diehards find that natural language searching increases effectiveness and efficiency sometimes.


Understanding Natural Approach

To know when natural language searching might be useful or appropriate, you must understand the way the approach works. First, researchers enter a research question or a string of keywords, ignoring syntax and Boolean connectors. They focus instead on expressing research issues completely (i.e., including all the keywords and concepts). In fact, they need not express the research issue as a question at all; entering a string of keywords that would appear in a well-written research issue works just as well. Second, the system requests researchers to select synonyms and, if they wish, designate some keywords as mandatory. (In effect, designating a keyword as mandatory instructs the system to precede it with AND rather than OR.) Researchers also can restrict searches, for instance, by date or court.

Strategy 1: As appropriate, identify synonyms and specify mandatory keywords when conducting natural searches.


Next, the underlying mathematical program takes over. Electronic research services closely guard the details of their proprietary algorithms, and each differs in some respects (e.g., Freestyle differs from WIN). However, natural language algorithms often follow procedures similar to the ones listed here:

  • Identify and discard unimportant words and words appearing too frequently to be helpful
  • Truncate keywords
  • Rank documents based on the frequency of their keywords relative to the frequency of keywords in the entire database and similar mathematical ratios (e.g., documents containing many of the keywords when the keywords do not appear frequently in the entire database receive a heavy weighting or relevance score from the mathematical algorithm)
  • Display top-ranked documents where the default ranking depends on statistical ratios such as the one mentioned immediately above
Strategy 2: To increase the mathematical weighting or relevance score of a keyword, consider typing it twice. (This strategy works in LEXIS’s Freestyle but may not work with all natural search engines.)

.

Deciding to Go Natural

How effective and efficient is natural language versus Boolean searching? The jury is still out on this question. To a large extent, the verdict depends on the researcher’s familiarity with Boolean syntax and strategies (see “Boolean Searches” lesson). Boolean neophytes often perform better “going natural.” But whether a Boolean novice or guru, researchers can more appropriately evaluate natural search results if they recognize and appreciate the approach’s limitations and strengths.

Placing too much reliance on natural language techniques can provide misleading search results. Since the mathematical algorithm (rather than the researcher) makes many choices, less precise results occur in some cases. Consider the following:

  • Natural language algorithms determine relevancy based on ratios such as the number of keywords in a document versus the number of keywords in the entire database. This approach ignores criteria tax professionals use to determine relevancy such as the age of a decision or the court rendering the opinion. Thus, “relevant” documents based on a natural language search may not be the most relevant in resolving the tax research issue.
  • Algorithms replace much of the richness of Boolean searches with mathematical rules that may not be appropriate for every research issue. For instance, Boolean searchers can use proximity and prohibitory connectors to perform very precise searches. Natural language searches focus more on keywords and how frequently they appear in the database rather than the relationships among keywords.
  • Some individuals may conclude that “relevant” documents retrieved through natural language searches necessarily represent good tax law. However, as with any other search technique, researchers must compare retrieved cases and rulings with the related statutory and regulatory law to assure consistency and, if consistent, “shepardize” the cases and rulings to further evaluate their legal significance in light of later decisions.
  • Natural language searches often retrieve documents not including all specified keywords. For instance, a recent Freestyle search in LEXIS’s CASES file using keywords “entertainment” and “accountable plan” retrieved 50 documents—27 containing “entertainment” and 28 containing “accountable plan.” Only five documents (ranked 1, 2, 3, 13, and 20) contained both keywords. Thus, some documents containing only one keyword ranked higher than other documents containing both keywords. Of course, designating both “entertainment” and “accountable plan” as mandatory terms is one remedy. But routinely specifying all terms as mandatory, particularly in search requests containing three or more keywords, can cause the algorithm to overlook some relevant documents.

Notwithstanding these objections, natural language searches sometimes return better results than Boolean requests. “Going natural” tends to work reasonably well (perhaps even better than Boolean logic) in the following situations:

  • The researcher only wishes to retrieve a few “relevant” documents (based on the statistical ranking); retrieving the most relevant is not so important. However, missing the most relevant rulings or cases when conducting tax research can lead to inappropriate conclusions and sub-optimal recommendations.
  • The research project is complex and involves several dimensions or interrelated transactions, all of which might be difficult to capture in a Boolean search (e.g., corporate restructurings requiring multiple related steps).
  • Some keywords are more important than others, and the more important words tend to be scarcer. In this case, natural language algorithms weight the more important keywords heavier.
  • The researcher finds the research issue difficult to specify, perhaps because specific keywords do not come to mind.
Strategy 3: As the “Basic Search Approaches” lesson indicates, difficulty identifying specific keywords often suggests that the researcher use a topical index or table of contents approach to locating tax authority. A third option is to conduct a natural language search.

Following up a Boolean search with a natural language search (or vice versa) often retrieves relevant documents the initial search did not. The reason is simple. Natural language searches emphasize the relative importance of keywords while Boolean places more emphasis on relationships between keywords. Especially when you are unfamiliar with the database, conducting a search using both Boolean and natural approaches can yield better results than relying only on one technique. A similar approach involves locating a highly relevant document through a Boolean search and then requesting other similar documents. (When using LEXIS, this latter function is called “More Like This” or simply “More.”) In effect, requesting similar documents begins a natural language search based on keywords appearing in the Boolean-retrieved model document.

Strategy 4: Conduct searches using both Boolean and natural language approaches. Also, after identifying a very relevant document through a Boolean search, request the natural language algorithm to retrieve similar documents.

Individuals can begin their electronic research within traditional tax services (e.g., CCH’s Standard Federal Tax Reporter and RIA’s Federal Tax Coordinator 2d) and then follow hyperlinks to primary sources of the tax law such as rulings and cases. Some electronic research services also allow individuals to follow a reverse process. That is, they can begin their Boolean or natural language search in primary source materials. After locating a particularly relevant ruling or case, they follow a hyperlink back to one of the traditional tax services to discover similar and often relevant information.

Strategy 5: To find information similar to a highly relevant document already located, follow the hyperlink back to a traditional tax service.