The art and science of web-based literature search: the MEDLINE.DR Sahu, RR Ramakantan, SB Bavdekar
Seth G. S. Medical College and K. E. M. Hospital, Parel, Mumbai - 400 012, India. , India
The utility of a thorough literature search is not limited to a student preparing for a seminar or dissertation, but is also required for those engaged in patient care. It has been shown that use of on-line literature searches early in the course of hospitalisation significantly reduces the duration of hospital stay and traditional paper-based references play only a limited role in clinical problem solving. However, with limited resources in terms of time and money and the ever-expanding literature volume, one would like to perform a search that is not only comprehensive or complete, but also do it in the shortest possible time. If the way we perform a search is not proper, we could well be a victim of the most fundamental computer adage: garbage in, garbage out. This happens because computer based literature databases make it possible for us to just punch in a few words and get some relevant results. We get the satisfaction of obtaining an instantaneous result. However, it is worth pondering whether this is the best response.
In this series, we will try to provide information about the medical databases and learn to conduct a comprehensive search in an efficient manner. We begin the series with the most widely used database, the MEDLINE. We will discuss the search features at Entrez PubMed (http://www.ncbi.nlm.nih.gov/entrez/query) based on PubMed Help. A simple tutorial is also available on http: //www.library.health.ufl.edu/pubmed/pubmed2/. However, only search-related issues will be discussed and topics related to displaying, copying and printing would not be dealt with.
The National Library of Medicine (NLM), USA and its predecessor, the Surgeon-General’s office of the U.S. Army, have been indexing medical literature since 1879, when the predecessor to today’s Index Medicus was first produced. MEDLINE (Medical Literature Analysis and Retrieval System Online) is the electronic version of the Index Medicus. It consists of over 10 million articles published in 4300 journals in 40 different languages from 75 countries. Approximately 8000 completed references are added to this database every Saturday, January through October amounting to the annual increment of over 400,000 articles.
To sift through this massive load for what one needs at a particular moment could be worse than looking for a pin in the haystack. We will, therefore, briefly review the basics of indexing and retrieval systems.
In text word search, the query words are matched word to word with those in the database. Such a search has some limitations. The words must exactly match in order for the article to be retrieved. Thus, a difference in spelling may lead to exclusion of an article. Secondly, as MEDLINE includes only the title and abstract of an article, incomplete or inadequate abstracts may preclude inclusion of the article. However, the most fundamental limitation is that the “text word” search does not pay any attention to the idea or concept behind an article. For example while searching for drugs used in Parkinsonism More Details use of key words ‘drug’ and ‘parkinsonism’ will retrieve articles related to drugs for treatment of parkinsonism as well as drug induced parkinsonism. In addition, this kind of search may lead to inappropriate results as the searcher and the author differ in the way they represent or understood a concept. For example one can represent coagulopathy as bleeding disorder, bleeding diathesis or by individual factor deficiencies. “Concept-based” search overcomes few of these limitations.
NLM has designed a “concept-based” search methodology using Medical Subject Heading (MeSH). Over 19,000 standardised medical terms constitute the thesaurus of MeSH. The arrangement of the MeSH is in the form of a tree where subject headings are arranged under one another with increasing specificity [Table - 1]. It also contains a group of 82 subheadings [Table - 2]. Trained indexers scan published articles, interpret the findings, identify the thrust or themes of these articles, and assign 10-12 MeSH terms and subheadings to each article. In an example shown in [Table - 3] ‘MH’ denotes the allotted MeSH terms for the given article and the terms after the forward slash (/) are the subheadings. Etiology and complications (marked using asterisk*) are the MeSH major topics for the given article. Major topics relate to the main concept conveyed by the article, as judged by the indexing experts.
The most basic form of search would be to type a word or two in the query box without any specification. PubMed handles the query word(s) in the following sequential manner (called automatic term mapping):
a. A query term without any specification (e.g. trichobezoars) is first mapped to the ‘MeSH Translation Table’, which contains MeSH Terms, and other systems to check for equivalent synonyms or lexical variants in English. If a match is found in this translation table, the term will be searched as MeSH and as a text word. For example, trichobezoar will be translated to bezoars and the resulting search would be ("bezoars"[MeSH Terms] OR trichobezoar[Text Word]). Similarly, ischemia is also searched as ischaemia and ANA as antinuclear antibody.
b. The query term is then mapped to the ‘Journals Translation Table’, which contains the full title of the journals, the MEDLINE abbreviation, and the ISSN number. It tries to map a search term to the journal abbreviation. For example, journal of postgraduate medicine will be translated to: "J Postgrad Med"[Journal Name].
c. The ‘Phrase List’ is consulted next whereby PubMed translates separate query words into as a single phrase. For example giant cell tumour is matched to the phrase “giant cell tumour.”
d. If the query word is not found in these three mappings, and is a word with one or two letters after it, PubMed then checks the Author Index. For example roy ak will be searched in the list of authors.
The results generated thus from this automatic mapping are displayed in a chronological order of entry of the citations in the MEDLINE.
One may be tempted to question the need for an advanced search if the basic search itself involves automatic mapping and also looks for the variations in languages. The answer to this question is simple, to enhance sensitivity and specificity of the search. For example, text words HIV therapy would generate more than 30,000 articles. Text words hospital infection control will get mapped onto the journal title Hospital Infection Control. Hence, we need to learn more than just typing text words as keywords.
We will now review few of the advance search facilities available at Entrez PubMed.
Each search field tag is abbreviated in two characters and can be typed in brackets, e.g. [au] for author’s name or [ti] for title words. Use of these tags directs the search in the specified field alone e.g. roy ak [au] will retrieve articles authored by roy ak; hernia [ti] will get the articles containing hernia in the article titles alone. A list of some common tags is provided in [Table - 4].
To search a term as a MeSH term, one may use search field tag [mh]. For example, if search is requested for therapeutics with the field tag [mh], MeSH search being a concept-based search would not retrieve articles that contains word therapeutics in title or abstract but do not have therapeutics as the main theme of the article. In this manner, the advanced search focuses on articles based on understanding of the main concept underlying the articles. This is almost akin to a search made on the basis of thinking or cognitive ability.
The search can be further fine tuned by using major topics [majr] and subheadings [sh]. While searching for articles related to aetiology of duodenal diseases, use of aetiology as one of the key word with the tag [majr] would retrieve the article cited in [Table - 3] and other articles focusing specifically on the aetiology of duodenal diseases. Subheadings can be used with MeSH terms to describe a particular aspect of the subject in a more comprehensive manner. Tag [sh] is used for such a search e.g. hernia [mh] AND surgery [sh]. The two character abbreviation of subheadings can also be used e.g. su for surgery [Table - 2].
Subheadings can be directly attached to the MeSH term in the following format: MeSH Term/Subheading-e.g. hernia/surgery or hernia/su, which would give even better results than henia [mh] AND surgery [sh]. Only one appropriate subheading can be attached to a MeSH term. Of course, not all MeSH term/subheading combinations are valid (e.g. hernia/toxicity).
We have seen the importance of MeSH terms and subheadings. The question is how does one know if a particular word or term is indeed a MeSH term. The MeSH Browser (http://www.ncbi.nlm.nih.gov:80/entrez /meshbrowser.cgi) can be used for this or one can download the entire list for personal use from http:// www.nlm.nih.gov/mesh/filelist.html. Another way to find out the MeSH terms is to check these from the MEDLINE citation of a known article.
‘AND’, ‘OR’, and ‘NOT’ are the Boolean characters, which can used to get more precision in a search. Please note that the Boolean characters should be typed in capital or upper case. More than one Boolean character can be used at a time. For example, if one were to find articles that deal with the effects of cough or constipation on inguinal hernia, he can use the syntax: (cough OR constipation) AND inguinal hernia. If search is required for review articles that discuss the treatment of hernia in all patients except children, syntax (hernia/therapy [mh] AND review [pt]) NOT child [mh] will give the appropriate results.
Please note that use of NOT can lead to unsatisfactory results. For example if one is interested in knowing about non-surgical treatment options for hernia and use hernia/therapy NOT surgery as the search, he will miss articles discussing the both surgical and non-surgical treatment.
PubMed search has a feature called “Limits.” By using this feature, one can literally limit the search to a specific age group, gender, human or animal studies, a specific language, type of articles (review, studies, etc), publication or Entrez date. One can even limit the search to a specific subset of citations within PubMed e.g. AIDS-related citations or in-process citations i.e. Pre-MEDLINE citations. These limits can be set from the Features Bar and can also be used in the form of tags as shown in the [Table - 5]. It should be borne in mind that the use of “limits” for publication type, age, gender, human or animal studies will restrict the retrieval to MEDLINE citations. The Pre-MEDLINE citations get excluded depriving some of the most recently entered citations that are undergoing the indexing process.
With the use of asterisk at end of a term one can search all the words (up to 150) beginning with that term. For example, search query with polyp* will include additional terms such as polyps, polyposis, polypus, polypectomy, etc. However, this kind of search will not allow the searcher to use the automatic term mapping. In addition, if there is a phrase involving term with a space after the specified term (e.g. polyp surgery), it will not be included in the result.
We have seen that PubMed uses a Phrase Index and maps two or more text words into a logical phrase. However, for a phrase that is recognised by PubMed as separate words, double quotes around the phrase, e.g. "medical literature” will force PubMed to search it as a single phrase. Please note that using this function one will switch off the automatic term mapping.
The following journal subsets (jsubset) are available for a search: jsubseta - Abridged Index Medicus, jsubsetd - Dental, jsubsetn - Nursing. For example: hernia/su NOT jsubsetn will not retrieve articles from nursing journals.
This feature allows to have a peek at the number of citations that will be retrieved. One can then add or delete more terms to refine the search. For example, search for HIV/therapy may result in about 30,000 retrievals, if word child is added it will give about 6,000 citation, addiing review [pt] will result in a manageable 800 articles being retrieved. Preview can thus be used to know the result of the search prior to actual display of the citations, thus saving the on-line time.
This feature allows to combine searches or add more terms to an existing search. Up to 100 searches are kept in ‘memory’ by PubMed for one hour and are numbered in the order they were performed. By using the sign (#) before the search number, e.g., #1 OR #2, or #1 AND (etiology OR pathology) one can improve the results.
This feature gives the strategy based on which the results were generated. From Details, one can save a search strategy or edit the search strategy and resubmit it. Studying this can help to improve the search abilities.
Each citation in PubMed has a link that will retrieve a pre-calculated set of PubMed citations that are closely related to the selected article. So if a ‘good’ article is found use this facility to get more ‘good’ articles.
Saving a search strategy
In the Details window, clicking URL translates and embeds search strategy as part of the URL. Then using web browser's bookmark function one can save it as a bookmark.
One may search by the full journal title, e.g. journal of postgraduate medicine; the MEDLINE abbreviation, e.g. j postgrad med; or the ISSN number, e.g. 0022-3859. However, searching with ISSN number alone may fail to retrieve older citations. If a journal name is also the same as one of MeSH headings or is a single word, (e.g., heart) PubMed will search the unqualified term as a MeSH heading or a text word, hence it is better to qualify the journal title with the journal title search field tag, e.g., heart [ta].
To search for a person use last name with the initials in small letters without any punctuations (e.g. roy ak). To find only Roy A use roy [email protected] or “roy a”. Whereas, roy a* or roy a will retrieve even roy aa, roy ab, etc.
PubMed's Single Citation Matcher allows to locate a specific single article using any or all of the following bibliographic elements: journal title, date, volume, issue, page, or author.
Indexing is done by humans who, in spite of their excellence and training, have different views of what is most important in the articles and not every researcher would view a paper in exactly the same way as the indexer has. It is therefore difficult to guarantee that all articles will always be indexed with every appropriate MeSH heading. Hence, MeSH alone is not sufficient when creating a powerful and comprehensive search and it is not advisable to rely solely on the MeSH,,,.
So neither the text based search nor the MeSH based search is complete, if performed in isolation. These two methodologies complement each other. Many researchers agree that a combination of free text searching and concept-based searching is the only solution to adequate searching of MEDLINE,,,. The new feature of PubMed whereby the search query is converted automatically to a combination of text word and MeSH term should help to solve this problem. The matter does not end here. Federiuk found that use of abbreviations in the titles and abstracts hampers the search results. She showed that three different search strategies retrieved pools of unique articles: concept based search with the abbreviation, text word based search with the abbreviation and the text word search with the definition of the abbreviation. Hence, to make sure that one gets the best results, one may have to use combinations of search methodologies.
One must also remember that MEDLINE is a date limited manually indexed database containing selected portions of selected journals. The journals listed in MEDLINE are ‘the best’ but not all the best. Hence any claim of ‘thorough’ literature search based solely on MEDLINE, is a loud claim. For a comprehensive search one need to review other databases, few of which will be covered in the forthcoming articles.
One of the most important issues is to formulate a flexible search strategy using combinations of text words and MeSH terms taking care of abbreviations, off-line. Preview the results, add or delete terms and then get the results. Use the feature ‘related to’. Use history feature to combine various search results.
One will have to vary his methodology depending on the need. In a clinical setting, a specific search using MeSH terms and major concepts should suffice. Randomised controlled trial [pt] or drug therapy or dt [sh] or therapeutic use or tu [sh] or all random [tw] should be used for treatment related studies. If one is writing a review or claiming to be the ‘first’ for something, use of all the possible combination of search methodologies is recommended. Text word searches can be used if the search is related to a very new subject and has not yet been given an indexing term, is a brand name, or is obscure.
It is a good strategy to check MeSH terms of one’s own articles once it is in MEDLINE. This will help to understand how indexing is done. May be the authors will be able to suggest better MeSH terms for their articles to NLM indexers!
There are numerous sites providing free access to MEDLINE. However, the number of relevant citations varies from site to site even though a similar query approach is used. There is difference in the ease of use, the interface, the level of details and additional facilities at various sites. Hence, depending on the need, one should be aware of the strengths and limitations of a particular site. One can find a list and comparison of these sites at http://www.docnet.org.uk/drfelix/ and http:// www.muhealth.org/~library/docs/mla97.html.
A few researchers grade their work by the number of times it is cited by other workers. For others to cite your work, it should first be retrievable by and accessible to the interested individuals. Hence, it is important to pay special attention to the following:
Abstract: The content of the article is very important. But it is not accessible fully to the searchers on the MEDLINE. What is available to them is the abstract of the article. So, it is essential to take utmost care that the abstract includes all the key elements, and is adequate to make the article easily and frequently retrievable.
Abbreviations: The results of both subject and text word searches may be affected by the use of abbreviations, hence abbreviations should be avoided. If an abbreviation is used but not defined in the title or abstract, the article will not be retrieved by text word search using the definition.
Medical educators have recognised the importance of teaching information retrieval skills to the medical students (Steering Committee on Medical Evaluation of Medical Information Sciences in Medical Education 1986). Proud et al have shown that when students were taught the skills of accessing MEDLINE, they could formulate a question, retrieve current information, critically review relevant articles, communicate effectively, and use these skills to contribute to patient care. If a newly learned skill is taught at the point of need, there is a greater likelihood of it being retained. Hence, it is worth including literature search using databases as one of the teaching-learning activities.
[Table - 1], [Table - 2], [Table - 3], [Table - 4], [Table - 5]