Frustration

Removing duplicates from an EndNote library

If you have done any systematic review searching, you will have spent time removing duplicate references from your results. Faced with the prospect of deduplicating 26k results, I put out a plea/rant on twitter.

AS often happens, lovely library colleagues came to the rescue. Naila Dracup (@nailadracup) sent me a link to a guide written by Judy Wright (@jmwleeds) and the AUHE Information Specialists at the University of Leeds.

Wichor Bramer (@wichor) has also written a paper about how to do this, which he pointed out on twitter.

You can find Wichor’s paper at Bramer WM, et al. De-duplication of database search results for systematic reviews in EndNote. J Med Libr Assoc. 2016;104(3): 240-3. doi:10.3163/1536-5050.104.3.014.

Below I’ve re-written the instructions provided by Leeds University Library as I have tested them myself. I’ve not had a chance to try Wichor’s technique. Let me know in the comments if you have given it a try.

1. Importing your references into EndNote

Import your results in the correct order

Did you know that the order that you import your references can have an impact on the quality of the information your EndNote library contains? This is because when EndNote removes duplicates, it automatically leaves the first copy added to your library and removes subsequent copies. So if you import your results from a database which doesn’t have abstracts (for example), then import results from one which does, the copy with the abstract will automatically be deleted.

It is recommended you import your references in the following order:

  1. Medline
  2. Embase
  3. Medline in process (if included)
  4. Other databases from OvidSP (PsycInfo, EconLit etc)
  5. PubMed
  6. Cinahl Plus
  7. Other databases from Ebsco
  8. Web of Science databases
  9. Scopus
  10. ProQuest databases
  11. Cochrane databases
  12. CRD databases
  13. Any other databases
  14. Clinical Trials websites

If you haven’t searched one or more of these databases, that’s fine. Just go to the next on the list. There is instructions on the LAS Databases page on how to import results from most of these databases to EndNote.

Always import all results into EndNote.

Organizing your imported references

I also recommend you organize your results into groups and add keywords so that you can keep track of where each reference has come from. I create a group for each database and drag and drop my results into the group as i’m importing. I also add a keyword to each reference which details the database the reference has been retrieved from. ITS EndNote training can tell you more about creating groups and editing fields in EndNote.

2. Set up your EndNote library for accurate duplicates removal

Once you have all of your references uploaded and organised in groups, display the following fields in EndNote so that you can accurately spot duplicates.

  • Record number
  • Author
  • Year
  • Title
  • Journal/Secondary Title
  • Pages
  • Volume

Do this by going to Edit > Preferences then clicking the ‘Display Fields’ option.

3. Find duplicates

Finding duplicates is a multi-stage process. This is because each database formats the information slightly differently, making accurate machine spotting of duplicates very difficult.

Step 1

Set the ‘find duplicates’ preferences to Author, Year, Title, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Journal and highlight those with a journal title in the journal field (ignore those with a blank journal field). Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see a new group has appeared called ‘Duplicates’ with duplicates highlighted. Click ‘Delete’ on the keyboard to move the highlighted items to the trash. These do not need to be checked.

Step 2

Set the ‘find duplicates’ preferences to Author, Year, Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages and highlight those with a page number in the pages field (ignore those with a blank pages field). Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Click ‘Delete’ on the keyboard to move the highlighted items to the trash. These do not need to be checked.

Step 3

Set the ‘find duplicates’ preferences to Title, Journal, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 4

Set the ‘find duplicates’ preferences to Year, Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 5

Set the ‘find duplicates’ preferences to Title, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 6

Set the ‘find duplicates’ preferences to Author, Year, Journal, Pages. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Pages. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no page numbers or page numbers beginning with 1, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 7

Set the ‘find duplicates’ preferences to Author, Year, Title Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check the references with no title, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 8

Set the ‘find duplicates’ preferences to Author, Year, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 9

Set the ‘find duplicates’ preferences to Year, Title, Journal. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 10

Set the ‘find duplicates’ preferences to Author, Year. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Journal. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 11

Set the ‘find duplicates’ preferences to Year, Title. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Title. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 12

I’ve found running a duplicate search on Year, Pages helps to find more duplicates. This is a particularly good step if you have non-English language articles in your results. Ignore any results which do not have a page number.

Step 13

Set the ‘find duplicates’ preferences to Title. Make sure ‘Ignore spacing and punctuation’ is checked.

Sort all references by Ttile. Run ‘Find Duplicates’ and click ‘Cancel’ in the resulting dialog box. You will see the duplicates group has been updated with a new group of duplicates. Manually check all references by looking at the page numbers field, and select/deselect duplicates by holding the Ctrl key while selecting or deselecting. Click ‘Delete’ on the keyboard to move the highlighted items to the trash.

Step 14

Now, you can catch the final few duplicates by manually picking them out. Sort your entire EndNote library by title and make the title column very wide so that you can see lots of the title words. Carefully look at your titles and remove the duplicate with the highest reference number. Be aware that sometimes translated titles are displayed in [brackets].

Step 15

Repeat step 12 but this time sort on page numbers and remove any duplicates.

Now you should have removed your duplicates. Numbers of references remaining in the groups can be used to complete the PRISMA diagram. Your results can now be used in the screening process.

Comments are closed.