Information is at the core of just about everything we do in the world of data analytics. And an awful lot of the data we use is in some way, shape or form related to real people. Increasingly concerns are being raised in the public about the amount of information that is stored and accessible by both private organizations and governments. This includes the risk of identity theft and other adverse consequences for consumers and citizen. Should data that they considered private somehow be made public or use for purposes that are unsanctioned by individuals in question. In our rules as data analyst, when the most critical questions we have to ask is how can we or how should we use data. In this video we're going to take a broad look at the idea of the data privacy by introducing four levels of standards, the guide how we use data that might be considered sensitive. However, we're not going to go into a lot of detail for a couple of reasons. First, the set of laws and regulations that govern data privacy is extensive and very complex and those regulations differ depending on where you are. Secondly, the data privacy landscape is changing very rapidly and what's true today might not be true tomorrow. Nonetheless, what we will do is give you a sense for some common definitions and the types of regulations that are out there. Our discussion will be slanted towards the data privacy environment in the United States but the same basic ideas will apply more globally. Let's outline these four levels of standards. The top level is legal standards which was established by law, order, or rule to compel treatment of certain classes of data. Legal standards must be followed by any organizations subject to them. There's not a lot of choice in the matter and consequences can be severe if legal standards are not followed. The second level is ethical standard. These standards are established by industry or professional organizations which see to achieve some level of non-legally binding treatment of information. There can be consequences for violating these standards but they are usually imposed outside of the courts. The third level of standards are policy standards, which are internal standards established by an organization to guide its own treatment of data, usually through something like a privacy policy. The company decides how to enforce these standards. The last level of standards is simply what we might call good judgment. Even if some action is not prohibited by legal, ethical, or policy standards. We should always ask ourselves, is this really a good idea and what might the consequences of using data in certain way be? We're going to go into each of these areas in a bit more detail but we'll spend the most time discussing a few types of data and the legal standards attached to them. Let's start with something called Personally Identifiable Information or PII. Like most terms associated with data privacy, PII has a long definition. As defined by the US National Institutes of Standard or NIST, PII includes any information about an individual maintained by an agency including. One, any information that can be used to distinguish or trace an individual's identity such as name, social security number, date and place of birth, mother/maiden name or biometric records. And two, any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. Here are some examples of what is considered PII. All or part of someone's name, including maiden name. Any identification number, address information, personal physical characteristics including images. And any number of things that may be linked to one or more of these definitive identifiers. The linked data part of the PII definition is particularly interesting, as it includes just about anything that I could conceivably link to an individual. In the area of Internet connectivity and big data the ability to link information across desperate domains has never been greater. In fact, both the National Institution of Standards and the US Office of Management and Budget, OMB have recognized how easy it might be to identifying individuals. Let's read part of their findings. A common misconception is that PII only includes data that can be used to directly identify or contact an individual or personal data that is especially sensitive. The OMB and NIST definition of PII is broader. The definition is also dynamic and can depend on context. Data elements that may not an identifying individual directly for example, age, height, birth date, may nonetheless constitute PIl if those data elements can be combined with or without additional data to identify an individual. In other words, if the data are linked or can be linked to the specific individual it is potentially PII. Moreover, what can be personally linked to an individual may depend on what technology is available to do so. As technology advances, computer programs may scan the Internet with wider scope to create a mosaic of information that maybe used to link information to an individual in ways that were not previously possible. This is often referred to as the mosaic effect. The implications of this mosaic effect are significant. Let's put a really fine point on this with a little math. Let's assume that we can find some number of characteristics on which people differ. To keep it simple, let's say that each characteristic can only have two values like male or female, homeowner or renter, married or not married, etc. It turns out that the number of different types of people I can have with n attributes of two values each is two to the n. So how many people is that? Let's say that we can assemble 29 attributes. With 29 attributes, we can describe up to just over 500 million types of people, which is significantly more than the US population of about 320 million. What if have just a few more attributes, say 33? With 33 attributes, we could describe eight and a half billion types of people which is more than the world wide population of about 7.4 billion people. Shall we keep going? Let's do one more. With 37 attributes we could describe over 137 billion types of people which is more than the estimated 108 billion people that have ever lived on the Earth. The point is this, it takes a surprisingly small number of data points to uniquely identify a very large number of people. Of course those data points need to be the right ones but you can see how this mosaic effect can have a very real implication on data privacy. What's really interesting about PII is that while there are quite a few legal standards they tend to be narrowly associated with specific government agencies or specific use cases especially in the United States. International standards are a bit more stringent but there's surprisingly little over arching legislation that restricts how personal information can be used. Here's a short list of some of the regulations that do cover PII in some way. We won't go through them, but you can use this as a reference point should you need to understand PII standards more deeply. The second type of information we'll discuss is consumer financial information, or CFI. CFI is defined in the US by the Gramm-Leach-Bliley Act, also known as the Financial Services Modernization Act of 1999, as follows. CFI is any information that is not publicly available. And that a consumer provides to a financial institution to obtain a financial product or service from the institution. Results from a transaction between the consumer and the institution involving a financial product or service. Or that a financial institution otherwise obtains about a customer in connection with providing a financial product or service. This definition is further incorporated into a variety of Federal Trade Commission and Securities and Exchange Commission guidelines, as well as into the Fair Credit Reporting Act. Regulations around CFI are a little more concrete than those around PII. Here are some general information of CFI legislation. First, it generally applied to financial institutions and those who collect nonpublic personal information from customers, consumers or financial institutions. They include a number of specific provisions and how account numbers and other specific pieces of information must be treated. However, most of the rules are around disclosure versus prescription of what's allowed or not allowed. This means that they don't so much restrict what we can do with information, but rather outline what we need to tell customers about that information and what options customers must have to restrict use of their information. And important detail here is that these regulations default when opt-out posture. Meaning that if customers don't want information using certain ways, they must actively opt-out from those users. So that's customer financial Information. Let's move on to another type of customer information called customer proprietary network information or CPNI. CPNI is collected by telecommunications companies about a customers telephone calls. It includes the time, date, duration and destination number of each call. The type of network a customer subscribes to and any other information that appears on the customer's telephone bill. Importantly, this definition does not explicitly include non-telephone activity like web browsing. Although they were varying legal opinions on wether this type of information is covered under CPNI regulations. CPNI regulations are generally governed by the US Telecommunications Act of 1996 and the 2007 Federal Communications Commission or FCC CPNI Order. There are also broader statutes like the Electronic Communications Privacy Act of 1986 and the Communications Assistance for Law Enforcement Act of 1994 or CALEA which speak to the conditions under which the government can access to this and other types of electronic data. Here's some key provisions of CPNI legislation. First, it limits the information which carriers may provide the third-party marketing firms without first securing the affirmative consent of their customers. It also defines when and how customers service representatives may share call details. It establishes notification and reporting obligations for carriers as well as identity verification procedures including a specific requirement. The verification processes must include a match between information provided by a person and what is shown in a company's systems. There are a couple of interesting details to these rules. For one thing, they do allow a company to freely share information with any other communications company which is a pretty broad set of players. Secondly, like CFI rules, CPNI regulations take an opt-out posture by default. The last type of information we'll talk about is Protected Health Information or PHI. PHI is considered one of the most sensitive types of information and consequently it's among those tightly controlled and regulated. In the US, PHI is defined under the Health Insurance Portability and Accountability Act of 1996 or HIPAA. The definition is three parts and reflects the detail involved. One, PHI is created or received by a health care provider, health plan, employer, or health care clearinghouse. Two, it relates to the past, present or future physical or mental health or condition of an individual, the provision of health care to an individual or the past, present or future payment for the provision of health care to an individual. And which either identifies the individual or with respect to which there is a reasonable basis to believe the information can be used to identify the individual. And three, is maintained in electronic media, or transmitted or maintained in any other form or medium. The provisions of HIPAA around PHI are pretty complex and we won't get into the details here. However, there are broadly covered under our privacy rule which speaks to the safeguards that must be taken to protect PHI in any form. And a security role which provides additional measures that must be taken when information is stored electronically. The rules applied to health care providers, health plans and health care clearinghouses. And they include a lot of specific provisions around how certain types of data need to be treated including the stripping out of identifiable information in other precaution. There are a couple of important exclusions to HIPAA regulations. First, they exclude education records covered by the Family Educational Rights and Privacy Act. They also exclude employment records held by a covered entity in its role as employer. Okay, that's a lot of information so let's recap before we move on. So far we have defined personally identifiable information PII, customer financial information, CFI, customer proprietary network information, CPNI and protected heath information PHI. We've also covered the major legal standards that apply to each type of information including major legislation and provision. Now let's talk about some of those other types of standards that influence how we use data, starting with ethical standards. Most academic, scientific, legal, and medical fields have pretty well established standards making bodies that hold members accountable for a broad set of ethical behaviors, some of which include the use of data. For example, in the US these might include State Bar Association, the American Medical Association, or other field specific organizations. For these more formal bodies, the consequences of violating ethical standards are usually sanctions or being kicked out of the organization. This can sometimes be pretty severe. For example, a lawyer being disbarred usually ends or severely limits his or her career. In the business world it turns out that some of the more relevant ethics and standards bodies operate in the area of marketing, which makes sense as we're generally interacting with customers through some sort of market activity or interface. These include the direct marketing association which provides broad guidelines on how to interact with customers. The digital advertising alliance which adds guidance on first party data collection. And the network advertising initiative which addresses third-party data collection and the practice of sharing data through data exchanges. However, the ability for these types of organizations to enforce their standards is much weaker. Companies typically comply out of choice, not out of necessity. But it's generally good practice to comply with these guidelines anyway. In addition to both the legal and ethical guidelines established by governments and other external entities, companies almost always establish their own internal policies regarding data privacy. It's common practice to make these policies available to customers and even proactively ensure that customers read and acknowledge them. Each company can have different permissions governing its privacy policy but generally speaking these policies outline what data is captured and shared and usually outline opt out or opt in procedure. As a data analyst, it's really up to you to ensure that you're aware of both the legal and ethical standards that apply, as well as your organization's policies around use of data. However, there's one final standard that always applies regardless of what other rules exist. That standard is good judgment. Think about what it is you're about to do. Even if it's legal, even if it's ethical and even if it falls within corporate policy it still might not be a good idea. As a general rule if it doesn't feel right chances are it's not a good idea. If you think your customers would be upset if they knew what you were doing It may not be a good idea. Here are few things to think as you develop your own sixth sense for what's good and what';s not. The first is the creepiness factor. We often use data and analytics to provide the best most customized offers that we can do our customers but there is a fine line between what's relevant and what's creepy. Especially if the customer sees things based on information they don't know you have. Put yourself in the customers used and ask yourself how you perceive an action. Secondly, it's generally a good idea to stay out of the news especially for the wrong reasons. Ask yourself what the consequences would be if your methods were made public and everyone could see what you're doing. Would there be a backlash? A third thing to think about is what the unintended consequences of your actions might be. You're obviously trying to get customers to do one thing but what if they did something else entirely, how bad would that be? Even if you were purely focused on the economics of a decision without regard to anything else and I'm definitely not advocating that approach. One might argue that you should end up with a good decision almost all the time if you properly asses the risk. Most folks who get themselves in the trouble fail to accurately asses the huge downsides associated with bad behavior. Assume the worst and you're much more likely to stay on the high road. So in summary, where data privacy is concerned, you have four levels of standards that can guide your behavior. Legal standards, ethical standards, policy standards, and above all, good judgement. Heed them wisely, and you'll have a longer and more successful career.