Citing Data
Citing data as you would cite other scholarly works such as articles and books increases the visibility and impact of your research. It also promotes greater transparency and rigor in research methods.
How do I know when to cite the data used in my research?
If you’re using any data that you did not personally collect or generate, you should always cite the original source. Data archives and government data portals often provide a recommended format to follow so that others can track the data back to the source.
Example from the National Addiction & HIV Data Archive Program:
Morris, Martina, and Richard Rothenberg. HIV Transmission Network Metastudy Project: An Archive of Data From Eight Network Studies, 1988--2001. ICPSR22140-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-08-09. http://doi.org/10.3886/ICPSR22140.v1
Example from Social Explorer:
U.S. Census Bureau. Table 8, Age, 2010. Prepared by Social Explorer. https://www.socialexplorer.com/tables/C2010/R11764616 (accessed Mar 14, 2024).
What format should I use to cite data?
First, look for the recommended citation format at the source. If there isn’t one, follow accepted best practice by including these key bits of information: Creator/Author, Title, Year, Version, and Persistent Identifier or Permanent URL/link.
Example from specialized data source:
U.S. Department of Housing and Urban Development. (2014). Location Affordability Index, Version 2.0. https://www.hudexchange.info/programs/location-affordability-index/. Accessed November 1, 2018.
If the data you are citing has a unique identifier like a DOI (digital object identifier), you can use the DOI Citation Formatter tool to generate a properly formatted citation for any selected style (APA, Chicago, MLA, etc.).
Example of data from the Dryad repository, generated in APA style using a DOI:
Lynch, Z. R., Schlenke, T. A., & de Roode, J. C. (2016). Data from: Evolution of behavioral and cellular defenses against parasitoid wasps in the Drosophila melanogaster subgroup. Dryad Digital Repository. http://doi.org/10.5061/dryad.5t5m4
Should I ask others to cite my data?
Yes! If you have collected or generated your own research data, the best way for you to get credit for these scholarly products is to deposit with a data repository, like Dataverse, Dryad, or openICPSR. These services format the information you provide about your data into a ready-made citation.
Example from Dataverse at Harvard:
Berkman, Lisa, 2015, "Restricted Access WFHS Tomo Baseline Employee Workplace CAPI Survey Data", doi:10.7910/DVN/MXLBAS, Harvard Dataverse, V2.
Example from the openICPSR:
Sullivan, P. S., & Rosenberg, E. S. (2015-01-05). Dyadic Data from Four Studies of MSM: 2009-2013 [United States; Atlanta, GA]. Ann Arbor, MI: Inter-university Consortium of Political and Social Research [distributor]. http://doi.org/10.3886/E100071V1
What if my data are actually code I wrote to analyze another dataset?
Code can be citable too! Data sharing services like the ones mentioned above accept all types of files, including software code and scripts written to process and analyze data. Some researchers choose to deposit code as standalone software packages with citations back to the original data source as appropriate. Others may include the code files as a way to both document the research process and allow another person to replicate their findings.
Example from Figshare:
Kosmala, Margaret; Swanson, Alexandra; Lintott, Chris; Simpson, Robert; Smith, Arfon; Packer, Craig (2015): Snapshot Serengeti Processing and Aggregation Scripts. figshare. https://dx.doi.org/10.6084/m9.figshare.1397507.v2
For researchers who already use a platform like Github to collaborate and share code with others, you may want to consider using Zenodo to capture snapshots of your code repositories and mint a DOI for a citable version of your software.
Example from Zenodo:
Samuel Jenness, ebey, Skye Bender-deMoll, Steven Goodreau, & kweiss2. (2016). EpiModel: EpiModel v1.2.7. Zenodo. http://doi.org/10.5281/zenodo.59113
As code files are often essential to replication of research, some academic journals expect authors to provide their code as part of the submission process. For examples of journals with policies that require or expect provision of code, see the following:
- Data and Code Availability Policy for journals published by the American Economic Association
- Data Integrity - Verification Policy for the American Journal of Political Science
- Science’s statement on Data and Materials Availability after Publication