West Virginia University Cold Case Investigation Database
Professor: Dr. Stephane Collignon
Co-authors: Ryan Franklin, Hannah McMillen
Data Management & Inquiry - Fall 2024
Database Purpose and Usefulness
West Virginia University Professor Tiffany Edwards and her team of undergraduate students receive 10 or more cold case assistance requests per year from law enforcement agencies in West Virginia, Virginia, and Maryland. There are currently 8 cold cases actively being investigated, with one case that has been open for over a year. About 15-20 cases are worked through each year, and cases can date as far back as the 1950s. The purpose of this database would be to serve as a means of organizing the files from these cases into a single location to make investigations easier for the research team. Currently, the cases arrive in paper format, on discs, or in some cases USB sticks, making it difficult to organize and reference cases. Additionally, the database would make it easier to present cases, either to fellow researchers or to the original investigator/requesting agency.
For this database, each agency can have one or multiple investigators. Agencies have an agency ID number, a name, a state (WV, VA, or MD), and an address.
Investigators can only belong to one agency and can investigate many (or none) cold cases. Investigators have a name, an employee ID, email address, phone number, whether they are retired or not, if the investigator is still alive, and what agency they belong to.
Cold cases can have one or many investigators and they can have one or many victims. Cold cases have a case ID number, the state they originate in (WV, VA, or MD), the date of the crime, and the type of crime. Victims can belong to only one cold case. Cold cases can have one or multiple pieces of evidence.
Victims have a name, a sex, cause of death, a date of birth, a date of death, the case number they are associated with, their weight, and the geographic location of the body (longitude/latitude). If a victim’s identity is unknown, the name will be entered as Jane Doe for females and John Doe for males and their date of birth will not be recorded.
Evidence can belong to only one cold case and must be either testimonial or physical. Evidence is connected to many or no suspects. Evidence belongs to one and only one cold case. Evidence has an evidence ID#, the date of collection, and the cold case number. If the evidence is testimonial, then the name of the person giving the testimony is collected, their contact information, as well as whether they are alive or not. If the evidence is physical, the following are provided: the scene item number, whether it has been analyzed, what the evidence item is, the current location of that piece of evidence and if an analysis has been requested (digital, drug, fire, toxicological, DNA, hair, fiber, tape, fingerprint, weapon, footwear, dental, handwriting).
Suspects are connected to one or multiple pieces of evidence. Suspects are connected to one and only one case. Suspects have a name, contact information, whether they took a polygraph, the date they took a polygraph if they had one, if they have an alibi, and their date of death if the suspect is deceased. As a note, the suspect’s date of death is recorded for follow-up purposes if investigators need to pursue other leads (such as relatives, etc.) in relation to the case.
ER Model
Relational Model
We used the normalization process when designing a database to ensure we are avoiding redundancy and inconsistencies. We ran through three stages of normalization to ensure our database is structured appropriately.
During the first stage (1NF) of normalization, we established that each table has a primary key and all attributes are indivisible and cannot be broken down further into more granular sub-components.
In the first stage we also ensured no multi-valued attributes exist. In the second stage, we reviewed to ensure all 1NF measures are satisfied and establish that all non-key attributes are dependent on the primary key.
In the third stage of normalization (3NF) we confirmed that we satisfied the requirements for 2NF, and further ensured that no attributes are dependent on other attributes that are not primary keys. After careful review and following the steps above, we are certain that our model is at 3NF.
Queries
Query 1: Two INNER JOINS
This query provides an overview of the agencies and locations of all the investigators that currently have cases within the database. This is helpful for keeping track of who to contact and what agencies are investigating cases.
Query 2: Calculation
This query shows the victims of cases where the victims are greater than or equal to the average weight of the victims. This could be useful in determining if there are physical similarities between victims across cases.
Query 3: CASE or IF statement
This query is designed to flag cases with cause of death of, ‘Poison’ as ‘Refer to Toxicology’ and all other cases as, ‘Non-Referral’. This can be useful for poisoning cases as it reminds the evaluator that they should be seeking further records from the toxicologist to assist with their investigation.
Query 4: RIGHT or LEFT OUTER JOIN
This query is meant to connect evidence to the victim table. Anyone researching the case would then be able to see if evidence is available and track to it accordingly. For any victims where there is no evidence, we will see a NULL value.
Query 5: GROUP BY
This query is designed to support the evidence team by providing insights into the percentage of evidence that has been analyzed and the proportion of individuals who provided testimonials who are still alive. These insights help the team monitor progress, prioritize unprocessed evidence, and allocate resources effectively. Additionally, identifying the living witnesses helps the team to prioritize follow-ups or conduct additional interviews, ensuring critical testimonial evidence is gathered appropriately.
Query 6: SELECT from GROUP BY
This type of query allows for a general understanding of how many total victims there are in the database based on biological sex (i.e., male or female), while providing who the victims are. A query like this can lend itself useful to victimology, especially if one suspects there to be a potential link between cases. This allows for the age determination to develop an idea of victim age for potentially linked cases, and how many total victims could be linked.
Query 7: Query with Subquery
This type of query is useful to determine the types of evidence that were collected and sent to analysis, in this case, DNA. In general, certain types of analysis are considered more crucial than others based on their probative value. Probative value is dependent on a number of factors, like what the item is, in addition to where it was found and its potential relation to the case. DNA is usually considered an analysis with high probative value, whether that’s to determine the potential identity of a victim or suspect; however, some evidence may sit in a backlog and then be forgotten about, especially with older cases. This type of query can quickly address this issue by evaluating the item itself, and whether it was analyzed, thereby allowing investigators to either push for an initial analysis or to have the item re-analyzed as technology has become more advanced.
Query 8: View of two INNER JOINs
By storing the investigator and agency information as a view it will be quick and easy to pull up this information as needed for contact or presentation purposes.
Appendix: Query Codes
Query 1: Two INNER JOINS
SELECT
AGENCY.IDNumber,
AGENCY.AgencyName,
AGENCY.State,
AGENCY.Address,
INVESTIGATOR.AgencyID,
INVESTIGATOR.Name,
INVESTIGATOR.EmployeeID
FROM
AGENCY
INNER JOIN
INVESTIGATOR ON AGENCY.IDNumber = INVESTIGATOR.AgencyID
INNER JOIN
INCASE ON INCASE.InvestigatorID = INVESTIGATOR.EmployeeID
ORDER BY
INVESTIGATOR.Name;
Query 2: Calculation
SELECT Name, Sex, CauseOfDeath, DOD, Weight, CaseNumber FROM VICTIM
WHERE Weight>=(SELECT AVG(Weight) from VICTIM);
Query 3: CASE or IF Statement
SELECT
Name,
Sex,
CauseOfDeath,
DOB,
DOD,
Weight,
CaseNumber,
BodyLocation,
CASE
WHEN CauseOfDeath = 'Poison' THEN 'Refer to Toxicology’'
ELSE 'Non-Referral'
END AS Note
FROM
VICTIM;
Query 4: RIGHT or LEFT Outer Join
SELECT
VICTIM.Name,
VICTIM.Sex,
VICTIM.CauseOfDeath,
VICTIM.DOB,
VICTIM.DOD,
VICTIM.Weight,
VICTIM.CaseNumber,
VICTIM.BodyLocation,
EVIDENCE.EvidenceID,
EVIDENCE.CaseNumber,
EVIDENCE.CollectionDate
FROM
VICTIM
LEFT OUTER JOIN EVIDENCE
ON VICTIM.CaseNumber=EVIDENCE.CaseNumber;
Query 5: GROUP BY
SELECT
(COUNT(CASE WHEN PHYSICAL.Analyzed = 1 THEN 1 END) * 100.0) / COUNT(PHYSICAL.PevidenceID) AS PercentageAnalyzedYes,
(COUNT(CASE WHEN TESTIMONIAL.Alive = 1 THEN 1 END) * 100.0) / COUNT(TESTIMONIAL.TevidenceID) AS PercentageAliveYes
FROM
PHYSICAL, TESTIMONIAL;
Query 6: Select from GROUP BY
SELECT
Sex,
COUNT(Name) AS TotalVictims,
GROUP_CONCAT(CONCAT(Name, ' (', DOB, ')') ORDER BY Name) AS VictimNames
FROM
VICTIM
WHERE
DOB IS NOT NULL
GROUP BY
Sex
HAVING
MAX(DOB) > '1970-01-01'
ORDER BY
TotalVictims DESC;
Query 7: Query with Subquery
SELECT PevidenceID, Item, Analyzed FROM PHYSICAL
WHERE EXISTS
(SELECT * FROM ANAREQ
WHERE PHYSICAL.PevidenceID=ANAREQ.PevidenceID
AND ANAREQ.AnalysisType='DNA');
Query 8: View of INNER JOINS
CREATE VIEW Investigator_Agency AS
SELECT
AGENCY.IDNumber,
AGENCY.AgencyName,
AGENCY.State,
AGENCY.Address,
INVESTIGATOR.AgencyID,
INVESTIGATOR.Name,
INVESTIGATOR.EmployeeID
FROM
AGENCY
INNER JOIN
INVESTIGATOR ON AGENCY.IDNumber = INVESTIGATOR.AgencyID
INNER JOIN
INCASE ON INCASE.InvestigatorID = INVESTIGATOR.EmployeeID
ORDER BY
INVESTIGATOR.Name;
pdf format
