If you have done any systematic review searching, you will have spent time removing duplicate references from your results. Faced with the prospect of deduplicating 26k results, I put out a plea/rant on twitter.
If anyone knows of a system which is good at removing duplicates from >20 databases, let me know. 4/
— Jane Falconer (@falkie71) November 17, 2018
AS often happens, lovely library colleagues came to the rescue. Naila Dracup (@nailadracup) sent me a link to a guide written by Judy Wright (@jmwleeds) and the AUHE Information Specialists at the University of Leeds.
If you don’t already use this method: https://t.co/bzxQ9jT2yy
— Naila Dracup (@nailadracup) November 17, 2018
Wichor Bramer (@wichor) has also written a paper about how to do this, which he pointed out on twitter.
Mine works faster i think, and sensitivity is 99.5% and error margin 1 in 3000.
I’d try my method.
Page numbers should be adapted, as medline has abreviated page numbers. My article describes how that can be done.
I import medline first then export with a special style & reimport— Wichor Bramer (@wichor) November 17, 2018
You can find Wichor’s paper at Bramer WM, et al. De-duplication of database search results for systematic reviews in EndNote. J Med Libr Assoc. 2016;104(3): 240-3. doi:10.3163/1536-5050.104.3.014.
Below I’ve re-written the instructions provided by Leeds University Library as I have tested them myself. I’ve not had a chance to try Wichor’s technique. Let me know in the comments if you have given it a try.
1. Importing your references into EndNote
Import your results in the correct order
Did you know that the order that you import your references can have an impact on the quality of the information your EndNote library contains? This is because when EndNote removes duplicates, it automatically leaves the first copy added to your library and removes subsequent copies. So if you import your results from a database which doesn’t have abstracts (for example), then import results from one which does, the copy with the abstract will automatically be deleted.
It is recommended you import your references in the following order:
- Medline
- Embase
- Medline in process (if included)
- Other databases from OvidSP (PsycInfo, EconLit etc)
- PubMed
- Cinahl Plus
- Other databases from Ebsco
- Web of Science databases
- Scopus
- ProQuest databases
- Cochrane databases
- CRD databases
- Any other databases
- Clinical Trials websites
If you haven’t searched one or more of these databases, that’s fine. Just go to the next on the list. There is instructions on the LAS Databases page on how to import results from most of these databases to EndNote.
Always import all results into EndNote.
Organizing your imported references
I also recommend you organize your results into groups and add keywords so that you can keep track of where each reference has come from. I create a group for each database and drag and drop my results into the group as i’m importing. I also add a keyword to each reference which details the database the reference has been retrieved from. ITS EndNote training can tell you more about creating groups and editing fields in EndNote.
2. Set up your EndNote library for accurate duplicates removal
Once you have all of your references uploaded and organised in groups, display the following fields in EndNote so that you can accurately spot duplicates.
- Record number
- Author
- Year
- Title
- Journal/Secondary Title
- Pages
- Volume
Do this by going to Edit > Preferences then clicking the ‘Display Fields’ option.
3. Find duplicates
Finding duplicates is a multi-stage process. This is because each database formats the information slightly differently, making accurate machine spotting of duplicates very difficult.
Step 1
Set the ‘find duplicates’ preferences to Author, Year, Title, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Journal and highlight those with a journal title in the journal field (ignore those with a blank journal field). Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see a new group has appeared called ‘Duplicates’ with duplicates highlighted. Click ‘Delete’ on the keyboard to move the highlighted items to the trash. These do not need to be checked.
Step 2
Set the ‘find duplicates’ preferences to Author, Year, Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Pages and highlight those with a page number in the pages field (ignore those with a blank pages field). Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Click ‘Delete’ on the keyboard to move the highlighted items to the trash. These do not need to be checked.
Step 3
Set the ‘find duplicates’ preferences to Title, Journal, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 4
Set the ‘find duplicates’ preferences to Year, Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 5
Set the ‘find duplicates’ preferences to Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 6
Set the ‘find duplicates’ preferences to Author, Year, Journal, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 7
Set the ‘find duplicates’ preferences to Author, Year, Title Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no title, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 8
Set the ‘find duplicates’ preferences to Author, Year, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 9
Set the ‘find duplicates’ preferences to Year, Title, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 10
Set the ‘find duplicates’ preferences to Author, Year. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 11
Set the ‘find duplicates’ preferences to Year, Title. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 12
I’ve found running a duplicate search on Year, Pages helps to find more duplicates. This is a particularly good step if you have non-English language articles in your results. Ignore any results which do not have a page number.
Step 13
Set the ‘find duplicates’ preferences to Title. Make sure ‘Ignore spacing and punctuation’ is checked.
Sort all references by Ttile. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.
Step 14
Now, you can catch the final few duplicates by manually picking them out. Sort your entire EndNote library by title and make the title column very wide so that you can see lots of the title words. Carefully look at your titles and remove the duplicate with the highest reference number. Be aware that sometimes translated titles are displayed in [brackets].
Step 15
Repeat step 12 but this time sort on page numbers and remove any duplicates.
Now you should have removed your duplicates. Numbers of references remaining in the groups can be used to complete the PRISMA diagram. Your results can now be used in the screening process.