This document contains citations in the following major sections:
  1. Studies using personal data, not from health sites
  2. Studies using personal data, from health sites
    1. 23andMe
    2. MedHelp
    3. PatientsLikeMe
    4. Personal Genome Project
    5. 1000 Genomes Project
  3. Concepts and Issues (privacy, data access, consent, self-tracking)
  4. Popular media

SECTION 1: Studies using personal data, not from health sites

Ayers, J. W., Althouse, B. M., Allem, J. P., Childers, M. A., Zafar, W., Latkin, C., . . . Brownstein JS. (2012). Novel surveillance of psychological distress during the great recession. Journal of Affective Disorders 142(1), 323-330.

  • Developed a method for linking internet searches to economic indicators to gauge population distress in real time, rather than retrospectively.

Ayers, J. W., Althouse, B. M., Allem, J-P., Rosenquist, J. N., & Ford, D. E. (2013). Seasonality in seeking mental health information on Google. American Journal of Preventive Medicine, 44(5), 520-525.

  • Described a method for analyzing Google queries to monitor seasonal changes in mental health at the population level.

Krumme, C., Llorente, A., Cebrian, M., Pentland, A. S., & Moro, E. (2013). The predictability of consumer visitation patterns. Scientific Reports, 3(1645), 1-5.

  • Used financial institution purchase history to find reliable shopping patterns at the aggregate level.

Madan A., Cebrian M., Moturu S., Farrahi K., & Pentland A. (2012). Sensing the “health state” of a community. IEEE Pervasive Computing, 11(4), 36-45.

  • Used mobile phone co-location and communication data to identify peer-to-peer contacts. Showed that cold/flu state, depressed state, obesity and political opinion could be predicted from peer state.

Moturu, S., Khayal, I., Aharony, N., Pan, W., & Pentland, A. (2011). Using Social Sensing to Understand the Links Between Sleep, Mood, and Sociability. Proceedings of IEEE Third International Conference on Social Computing.

  • Used social and behavioral sensing software on mobile phone. Examined relationship between sleep, mood and sociability.

Pellegrini, C. A., Verba, S. D., Otto, A. D., Helsel, D. L., Davis, K. K., & Jakicic, J. M. (2009).

  • Used BodyMedia device as a condition for evaluating weight loss.

Song, C., Qu, Z., Blumm, N., & Barabasi, A. L. (2010). Limits of predictability in human mobility. Science, 327(5968), 1018–1021.

  • Used cell phone data to predict location.
  • 93% potential predictability in user mobility across the whole user base.

SECTION 2: Studies using personal data, from health sites

Section 2.a: 23andMe

Do C. B., Tung, J. Y., Dorfman, E., Kiefer A. K., Drabant E. M., Francke, U., . . . Eriksson, N. (2011). Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genetics, 7(6): e1002141.

  • Using 23andMe data, identified two new genetic factors for Parkinson’s disease and confirmed other factors.

Eriksson N., Macpherson, J. M., Tung J. Y., Hon L. S., Naughton, B., Saxonov, S., . . . Mountain, J. (2010). Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genetics, 6(6): e1000993

  • Using 23andMe data, identified genes for freckling, curly hair, asparagus smell, and photic sneezing.

Section 2.b: MedHelp

Hagan, J. C., & Kutryb, M. J. (2009). Internet forums track patients' IOL concerns. Review of Ophthalmology, 16(4), 52–55.

  • Using data from consumer forum posts on MedHelp, identified problematic lens implants.

Section 2.c: PatientsLikeMe

Comstock, J. (2013). Needed: Standardized outcome measures for patient-generated data. MobiHealthNews. Retrieved from

  • Summary of the Patients Like Me open-participation research platform and its efforts to develop standardized outcome measures.

Section 2.d: Personal Genome Project

Mali, P., Yang, L. Esvelt, K. M., Aach, J., Guell, M., DiCarlo, J. E., Norville, J. E., & Church, J. M. (2013). RNA-guided human genome engineering via Cas9. Science, 339(6121), 823-826.

  • Developed an RNA-guided genome editing system and used Personal Genome Project data to create a “genome-wide reference of potential target sites in the human genome.”

Shoemaker, R., Deng, J., Wang, W., & Zhang, K. (2010). Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. Genome Research, 20, 883-889.

  • Using cell lines from an individual donor to the Personal Genome Project, characterized allele-specific DNA methylation and its role in fuzzy methylation.

Section 2.e: 1000 Genomes Project

Mainland, J. D., Keller, A., Li, Y. R., Zhou, T., Trimmer, C., Snyder, L. L., . . . Matsunami, H. (2014). The missense of smell: functional variability in the human odorant receptor repertoire. Nature Neuroscience, 17, 114–120.

  • Used 1000 Genomes project samples to determine how often genetic polymorphisms in odorant receptors alter receptor function.
  • On average, two individuals have functional differences at over 30% of their odorant receptor alleles.
  • Study effectively doubled number of known odorant-receptor pairs.


Accenture. (2014). Racing toward a complete digital lifestyle: digital consumers crave more. Accenture Digital Consumer Tech Survey 2014.

  • Survey conducted in fall 2013 in 6 primarily Anglophone countries.
  • Gauges public opinion on elements of "digital lifestyle," including wearable devices, health, "appification," and trust.

Conger, K. (2012). B!G DATA: What it means for our health and the future of medical research. Stanford Medicine Magazine, 29(2), 6-15.

  • Profiles Mike Snyder, chair of genetics at Stanford, participating in integrative personal genomics profile, or iPOP.
  • Other examples include:
  • The Big Data Research and Development Initiative is a federal funding program to “greatly improve the tools and techniques needed to access, organize and glean discoveries from huge volumes of digital data.”
  • National Institutes of Health and the National Science Foundation are offering up to $25 million for devising ways to visualize and extract biological and medical information from large and diverse data sets.
  • NIH announced it would provide researchers free access to all 200 terabytes of the 1,000 Genomes Project—an attempt to catalog human genetic variation—via Amazon Web Services.

Drew B. T., Gazis, R., Cabezas, P., Swithers, K. S., Deng, J., Rodriguez, R., . . . . Soltis, D. (2013). Lost branches on the Tree of Life. PLoS Biology 11(9): e1001636.

  • Reviewed thousands of previously published phylogenetic studies and estimated that two-thirds had no available supplemental data beyond the article figures.
  • Demonstrates that archiving raw data sequences is insufficient for enabling reproducibility.

Fiore-Silfvast, B., & Neff, G. (2013). What we talk about when we talk data: Metrics, mobilization, and materiality in performing health online. Selected Papers of Internet Research 14.0.

  • Describes a model of “data valences” to understand the different meanings of health data for different stakeholder groups, including physicians, designers, and users (patients).

Fox, S. & Duggan, M. (2013). The diagnosis difference. Pew Research Center.

  • Having a chronic condition is associated with being offline.
  • But internet users with a chronic condition are more likely to gather and share health and medical information using the internet.

Fox, S., Duggan, M. & Purcell, K. (2013). Family caregivers are wired for health. Pew Research Center.

  • Four out of ten adults care for a child or an adult with a significant health condition.
  • Adult caregivers are more likely to be involved in health-related activities, including gathering information offline and on the internet, and tracking health indicators.

Gibson, G. & Copenhaver, G.P. (2010). Consent and internet-enabled human genomics. PLoS Genetics 6(6): e1000965.

  • Editorial from PLoS Genetics about their willingness to publish research using 23andMe data.
  • “The editors of PLoS Genetics decided to proceed after satisfying ourselves on two major points, namely that the participants were not coerced to participate in the study in any way, and they were clearly aware that their samples would be used for genetic research.”
  • The research was deemed “not human subject research” by an independent human subjects review board because it did not meet either criteria of (1) investigators obtaining data through interaction with participants or (2) subjects being identifiable by investigators.
  • Researchers used a commercial IRB, after article already submitted, but using a commercial firm is already standard practice in pharmaceutical and biotech industries.
  • “The study was not performed under the auspices of their Universities, and we did not feel that review by an academic IRB was necessarily appropriate.”
  • “For situations in which a study does not meet the aforementioned criteria but obtaining a consent form would still be desirable, there are no guidelines or policy with regard to how such a consent form should be developed.” In other words, even when the researchers neither interact with the participants nor can identify the participants, but for ethical reasons it would be good to have consent, there are no guidelines for how to proceed.
  • The editors noted the concern about lack of open access to the underlying data but felt the insights of the paper were of higher value to the public good, especially since collaborators could follow-up with a similar process.

Grajales, F., Clifford, D., Loupos, P., Okun, S., Quattrone, S., Simon, M., . . . Henderson, D. (2014). Social networking sites and the continuously learning health system: A survey. Institute of Medicine of the National Academies.

  • An analysis of results from two surveys, one conducted by the Consumer Reports National Testing and Research Center, and a second conducted by PatientsLikeMe.
  • Findings include 94% of US social media users with health conditions would share anonymous health data for health research.
  • Despite concerns about risk, respondents felt the benefit of sharing outweighed the risk.

Gymrek, M., McGuire, A. L., Golan, D., Halperin, E., & Erlich, Y. (2013). Identifying personal genomes by surname inference. Science, 339(6117), 321–324.

  • Quantified the potential to identify targets in anonymous genetic sequence data.

Hildebrandt, M., O'Hara, K., & Waidner, M. (Eds.) (2013). Digital Enlightenment Forum yearbook 2013: The value of personal data. Amsterdam: IOS Press.

  • Edited volume of 16 chapters devoted to understanding personal data issues from legal, computer sciences, ethics, social science, policy, and engineering perspectives.
  • Issues include privacy, ethics, consent, open data, and data control.

Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., . . . Nelson, S. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics, 4(8): e1000167.

  • Assessed the probability of determining that a person or relative participated in a genome-wide association study.

Huberman, B. A. (2012). Sociology of science: Big data deserve a bigger audience. Nature, 482(7385), 308.

  • Brief letter by affiliate of HP Labs, raising concern about big data for published studies not being publicly available.
  • Expresses concern that proprietary data (Facebook, Google, etc) used for published studies isn't made publicly available, making verification and replication impossible.
  • Huberman: this trend could result in a "small group of scientists with access to private data repositories enjoying an unfair amount of attention."
  • In response to a conference he chaired in which three scientists from Google and the University of Cambridge declined to release data they had compiled for a paper on the popularity of YouTube videos in different countries.

Hripcsak, G., Bloomrosen, M., Brennan, P. F., Chute, C. G., Cimino, J., Detmer, D. E., . . . Wilcox, A. B. (2013). Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA's 2012 Health Policy Meeting. Journal of the American Medical Informatics Association

  • Explains need for widely accepted data stewardship principle to avoid data silos and protect patient privacy.
  • Stakeholders at AMIA's 2012 Health Policy Meeting formulated recommendations.
  • Proposed Principles of Health Data Use.
  • Asserts that health data is a public good, and understanding and support from patients will be necessary to realize that good.

Intel. (2013). The world agrees: Technology inspires optimism for healthcare.

  • Press announcement of Intel study, “Intel Healthcare Innovation Barometer,” a multi-national survey of people’s beliefs about the role of new technologies in healthcare.

Kerns, J. W., Krist, A. H., Longo, D. R., Kuzel, A. J., & Woolf, S. H. (2013). How patients want to engage with their personal health record: a qualitative study. BMJ Open, 3(7).

  • Focus group study of adults interested in using an interactive preventative health record.
  • Desire to use IPHR, relevance, privacy, accuracy, patient-clinicial relationship and practice workflow are issues in the successful adoption of the technology.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., . . . Alstyne, M. (2009). Computational Social Science. Science, 323(5915), 721–723.

  • Discusses trends and issues with big data analysis.
  • Describes two scenarios for the future of computational social science, neither in the public interest: one in the domain of internet companies and government agencies; the other in the domain of academic researchers with private datasets.
  • Calls for the development of computational social science in an “open academic environment” and examines obstacles to that development.
  • Individual profiles can be extracted from anonymized data.
  • U.S. National Institutes of Health and the Wellcome Trust abruptly removed a number of genetic databases from online access.
  • "It may be necessary for IRBs to oversee the creation of a secure, centralized data infrastructure."

Lazer, D., & Mayer Schönberger, V. (2006). Statutory frameworks for regulating information flows: Drawing lessons for the DNA data banks from other government data systems. The Journal of Law, Medicine & Ethics, 34(2), 366–374.

  • Privacy regulations on existing personal data domains and future recommendations.
  • Principles of information privacy.
  • Current regulations of DMV records, fingerprints, tax records.
  • Data privacy recommendations.

Li, I., Dey, A., & Forlizzi, J. (2010). A stage-based model of personal informatics systems. In CHI ‘10 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 557-566.

  • Surveyed members of Quantified Self about collection, motivation and pattern discoveries. Identified stages of preparation, collection, integration, reflection, and action.

Li, I., Dey, A. K., & Forlizzi, J. (2011). Understanding my data, myself: supporting self-reflection with ubicomp technologies. In UBIComp ’11 Proceedings of the 13th international conference on Ubiquitous computing, 405-414.

  • Conducted user interviews to identify major question areas (Status, History, Goals, Discrepancies, Context, and Factors) and question phases (Discovery and Maintenance) for self-tracking.

MacLeod, H., Tang, A., & Carpendale, S. (2013). Towards personal informatics tools for chronic illness management. GRAND 2013.

  • A small-scale study of people who use self-tracking to manage chronic illness.
  • Proposes that emphasizing curiosity rather than behavior change will result in more useful design of tracking technologies.

Mascalzoni, D., Knoppers, B. M., Aymé, S., Macilotti, M., Dawkins, H., Woods, S., & Hansson, M. G. (2013). Rare diseases and now rare data? Nature Reviews Genetics, (14)372.

  • Warns that privacy and personal control over data are at odds with the trend toward open data, and jeopardizes rare disease research.

Morita, M., Ogishima, S., Nishimura, K., Aramaki, E., & Ito, T. (2013). Online population-based patient registry to collect and share health-related data of rare disease patients. Data Driven Wellness: From Self-Tracking to Behavior Change: Papers from the 2013 AAAI Spring Symposium.

  • Describes a framework for the development of database to collect rare disease data from patients, to promote research interest in “orphan” pharmaceuticals.

Pentland, A., Lazer, D., Brewer, D., & Heibeck, T. (2009). Using reality mining to improve public health and medicine. Studies in Health Technology and Informatics, 149, 93–102.

  • Individual health, social networks, behavior patterns, infectious disease, mental health.
  • Robert Wood Johnson Foundation White Paper on "reality mining" (analysis of person-generated data).

Privacy Rights Clearinghouse. (2013). Mobile health and fitness apps: What are the privacy risks? (Fact Sheet 39).

  • Almost all applications send de-identified (non-personal) data.
  • However, some may uniquely identify users through device or app id.
  • Apps may have poor security, and transmit unencrypted data.
  • 40% were "high risk": address, financial info, full name, health information, geo-location, date of birth (DOB), ZIP code.
  • 43% of free apps share information with advertisers (unclear how or what level of anonymity).
  • 55% of paid and 60% of free apps in sample use third-party analytics services.

Rainie, L., Kiesler, S., Kang, R., & Madden, M. (2013). Anonymity, privacy, and security online. Pew Research Center.

  • Documents concerns related to growing amounts of personal information now online.
  • 86% of internet users have taken steps online to remove or mask their digital footprints—ranging from clearing cookies to encrypting their email.
  • 55% of internet users have taken steps to avoid observation by specific people, organizations, or the government.
  • The representative sample of 792 respondents also finds that notable numbers of internet users say they have experienced problems because others stole their personal information or otherwise took advantage of their visibility online. Specifically:
  • 21% of internet users have had an email or social networking account compromised or taken over by someone else without permission.
  • 12% have been stalked or harassed online.
  • 11% have had important personal information stolen such as their Social Security Number, credit card, or bank account information.
  • 6% have been the victim of an online scam and lost money.
  • 6% have had their reputation damaged because of something that happened online.
  • 4% have been led into physical danger because of something that happened online.

Servick, K. (2013). Frustrated U.S. FDA issues warning to 23andMe. ScienceInsider.

  • Discussion of FDA’s letter warning 23andMe about marketing issues with its genomics test.

Song, I. & Vong, J. (2014). Social networks and automated mental health screening. In M. Lech, I. Song, P. Yellowlees, & J. Diederich (Eds.), Mental Health Informatics: Studies in Computational Intelligence (107-123). Berlin: Springer.

  • Describes a social networking framework for patient care.
  • Machine learning generates key words from online forum discussions to recommend patient communities.

Steinsbekk, K. S., Myskja, B. K., & Solberg, B. (2013). Broad consent versus dynamic consent in biobank research: Is passive participation an ethical problem? European Journal of Human Genetics, 21, 897–902.

  • Multitude of criticisms of dynamic consent model, but no apparent knock-down argument against it.
  • Difference between models is “whether consent to ‘unknown’ future activities, can be labelled ‘informed consent’ and be viewed as an expression of an autonomous will.”
  • In dynamic consent, participants are “consented” for every study, so for both meaningful and trivial changes in relation to earlier consents. In broad consent, only re-consented for meaningful changes. Authors argue that dynamic consent does not respect autonomy better compared to broad consent.
  • Biobanks are already obliged to keep members continuously informed, so dynamic consent is just a different “information policy.”
  • Participants may be overwhelmed by the complexity of continuous consent and therefore less likely to participate.
  • Concern that dynamic consent could encroach on participant governance of major research projects, when these decisions might better be handled by experts.
  • May lead to relaxed IRB reviewing because of the perception that participants have the ability to withdraw anyway.
  • The focus on returning research results to participants, which dynamic consent could facilitate, pushes researcher in the direction of healthcare.

Van Assche, K., Gutwirth, S. & Sterckx, S. (2013). Protecting dignitary interests of biobank research participants: Lessons from Havasupai Tribe v Arizona Board of Regents. Law, Innovation and Technology, 5(1), 54-84.

  • Traces the development of “dignitary interests” as category for evaluation of biobank research.
  • Dignitary harms involve infringement on the autonomy, privacy and moral integrity of research participants.

Vayena, E., & Prainsack, B. (2013). Regulating genomics: Time for a broader vision. Science Translational Medicine, 198(12).

  • Considers social shifts in the uptake of genetic testing to propose referring to direct-to-consumer genetic testing as “beyond the clinic” genetic testing.
  • Argues for genomic regulation that balances protecting consumer well being with making informed choices about health care and data sharing.

Yekeh, F., & Kay, J. (2013). Hypothesis evaluation based on ubicomp sensing: moving from researchers to users. First International Conference on Behavior Change Support Systems, 23-28.

  • Proposes an architecture for “ubicomp sensors” (wearable devices) based on users’ hypotheses personal health and behavior.

SECTION 4: Popular Media

Brill, J. (2013, August 15). Demanding transparency from data brokers. Washington Post.

  • Edward Snowden’s revelations about National Security Agency opened society-wide conversation about tensions between national security and privacy.
  • New attention to the kind of data we generate through use of technologies.
  • There’s no equivalent to the Fair Credit Reporting Act (FCRA), a law that "requires that entities that collect information for those making employment, credit, insurance and housing decision" to be accurate, for marketing and data brokers.
  • "Personal data could be… used by firms making decisions that … affect users' lives profoundly. … too risky to do business with or aren’t right for certain clubs, dating services, schools or other programs."
  • "Reclaim Your Name"—Four point standard for industry to adopt voluntarily: basically the typically right to access, correct, opt-out, and also to know how brokers find and use data.

Carney, M. (2013, May 20). You are your data: The scary future of the quantified self movement. Pando Daily.

  • Warns of the unforeseen dangers of wanton sharing health data.
  • “Blind participation without considering the implications of your data being recorded and shared with third parties is reckless."
  • Anecdotes about data sharing stories, insurance companies profiling, etc.
  • CVS requires employees to report weight or get a fine for insurance.
  • State Farm Insurance offers a discount for real-time GPS tracking.
  • LexisNexis, another data broker (current and past residence, spending history, banking information, health information).
  • PositiveID, combination of human-implanted RFID and credit scoring, focusing on medical applications.

Greenfield, R. (2013). Tracked since birth: The rise of extreme baby monitoring. Fastcompany.

  • Reviews technologies for "extreme baby monitoring," including devices that track heart rate, breathing patterns, room conditions and dirty diapers.
  • Likens excessive tracking to treating babies like Tamagotchis, or digital “pets” that were a toy craze in the 1990s.

Health eHeart. (2013). Early insights using’s app.

  • Announces a few early findings from the Health eHeart study, including a relationship between high blood pressure and size of social circle, and activity like walking and biking and cholesterol level.

Howard, A., & Wofford, Martha. Who owns the data? Self-tracking to health 2.0. SXSW. March 9, 2013.

  • A 60 minute presentation from SXSW festival, in which Howard and Wofford discuss benefits and barriers to using data for improving the health care system in the context of national reform.

Klasnja, P., & Pratt, W. (2014). Managing health with mobile technology. Interactions, (21)1, 66-69.

  • Overview of mHealth technologies for human-computer interaction researchers.
  • Reviews apps, devices, technological issues, and social factors in use of mobile technologies for wellness.

Markoff, J. (2012, May 21). Big data troves stay forbidden to social scientists. New York Times.

  • Highlights concerns with privacy and sharing in big data analysis.
  • In the Internet era,” said Andreas Weigend, a physicist and former chief scientist at Amazon, “research has moved out of the universities to the Googles, Amazons and Facebooks of the world.”
  • NSF researchers are expected to share data.
  • "[Issues with private data] is one of the reasons the general pattern at Google is to try to release data to everyone or no one"—Hal Varian, Google’s chief economist.
  • “It’s antithetical to the basic norms of science to make claims that cannot be validated because the necessary data are proprietary"—Michael Eisen, founder of PLoS.

Moss, F. (2011, November 9). Our high-tech health-care future. New York Times.

  • Author Frank Moss was director of MIT Media Lab from 2006-2011.
  • Advocates the use of person-generated data to drive "consumer health" innovations in medicine.
  • Envisions automating medical advice, using low-cost hardware and software to analyze and visually represent health data.
  • Proposes government sponsored standards development for interoperability and privacy.

Pogue, D. (2013, June 26). Wearable devices nudge you to health. New York Times.

  • Consumer-oriented review of Jawbone Up, Fitbit Flex, and Nike FuelBand.
  • Compares features such as wireless, interface, style, sleep tracking.

Quart, A. (2013, June 26). The body data craze. Newsweek.

  • Comprehensive explanation of self-tracking for a general audience.

Singer, N. (2013, July 25). Under code, apps would disclose collection of data. New York Times.

  • Possible momentum from a large consortium to voluntarily disclose data collection, including health data.
  • Yearlong negotiations convened by the National Telecommunications and Information Administration.
  • Negotiations included advocacy groups like the American Civil Liberties Union and the World Privacy Forum.
  • Eight categories including "health or medical data."
  • Obama administration to institute a wide-ranging consumer privacy bill of rights.

Singer, N. (2013, August 31). A data broker offers a peek behind the curtain. New York Times.

  • Acxiom Corporation, marketing data broker, is "one of the most secretive and prolific collectors of consumer information."
  • Using openness as a strategy for improving marketing data, by allowing individuals to review, correct, suppress and opt-out through website

Woods, B. (2013, September 17). What’s the true value of your personal data? Meet the people who want to help you sell it. The Next Web.

  • Reviews case of Malte Spitz, the German Green Party politician that petition for his telcom GPS data and made a public demo available.
  • Value of data when part of bulk, anonymized set, is about 22 cents per person.
  • Federico Zannier set up Kickstarter project, offering complete, personal daily activity dataset that netted $2700.
  • Handshake is a data platform for selling personal data; the high end estimate of $8000 for data is for someone with many completed surveys and documented experiences.
  • Enliken was doing something like Handshake but couldn't find the companies to buy data.
  • Ctrlio is a personal data platform trying to give control over how data is shared with companies and marketers.