30th oct

In summary, to comprehensively understand police shootings, we can analyze victim demographics, map incident locations, study temporal trends, assess police training and policies, gather community perspectives, and conduct legal and policy analysis. These dimensions provide a holistic view of the complex factors contributing to this issue. Beyond age, we can delve into the demographics of the victims, including gender, ethnicity, and socioeconomic factors. This can help us identify disparities and potential biases in police shootings

27th

Today, I delved into the relationship between indicators of mental health issues and race in cases of police shootings. We’re utilizing a tool called facet grids in the R programming language to explore this connection. Our primary objective is to uncover any patterns or correlations that can provide insights into the complex interplay between mental health and interactions with the police.

The visual representations we generated offer a detailed perspective on how mental health and race may intersect within the context of police shootings. We will examine these visual grids closely and discuss the implications of our findings. These discoveries have the potential to inform public discussions and even influence policies concerning the handling of mental health issues during encounters with law enforcement. Join us as we navigate through these visual insights and engage in a dialogue about their significance.

25th

K-means: This method groups data into ‘k’ number of clusters by reducing the distance between data points and the center of their assigned cluster. The ‘means’ in the name refers to averaging the data points to find the center of the cluster. It’s good for spherical-shaped clusters.

K-medoids: Similar to K-means, but instead of using the mean, it uses actual data points as the center of the cluster, known as medoids. This method is more robust to noise and outliers compared to K-means because medoids are less influenced by extreme values.

23rd oct

I began closely examining the data and searching for correlations between various factors. By using numerical statistics, I generated preliminary hypotheses about how certain elements such as signs of mental illness, threat levels, and attempts to flee might influence the outcomes of police interactions. These initial ideas serve as a foundation for conducting more in-depth analyses in the upcoming weeks.

It’s akin to constructing a roadmap to comprehend the intricate relationships among different aspects of these incidents. As we uncover more connections and patterns, it aids us in assembling a clearer picture of the dynamics at play in police encounters. The overarching goal is to explore and make sense of the data, allowing us to gain deeper insights into the factors that shape the outcomes of these situations.

on shooting 20th oct

As its clear there are casualities in the age group of teens, it’s true that half of the individuals fatally shot by the police are White, there’s a significant disparity in the rate at which Black Americans and Hispanic Americans are shot. Black Americans make up approximately 14 percent of the U.S. population but experience police shootings at a rate more than double that of White Americans. Similarly, Hispanic Americans also face a disproportionate rate of police shootings.It requires further study to get a clear picture

16th linear regression

This week, our focus was on logistic regression, a statistical method used for analyzing the relationships between variables, particularly when the outcome of interest is binary or categorical. In other words, we were looking at situations where we want to predict whether something will happen or not, based on various factors.

During our discussions, we likely covered the fundamentals of logistic regression, including the logistic function that transforms linear combinations of variables into probabilities. This technique is commonly used in predictive modeling when dealing with outcomes like “yes” or “no,” “success” or “failure,” or “positive” or “negative.”

In addition to the theoretical aspects, it seems we also put the logistic regression into practice by performing some charting on the data variables. This might involve creating visual representations like scatter plots, bar charts, or line graphs to explore how the variables relate to each other and how they might influence the binary outcome we are interested in.

In essence, this week’s focus on logistic regression and data visualization equips us with the analytical tools needed to make sense of data, uncover hidden trends, and use this knowledge to make informed decisions. It’s a fundamental step in the process of applying statistical methods to address practical problems and contributes to our ability to navigate complex real-world scenarios with confidence.

oct 11th

Project 2 involves the thorough examination of two distinct datasets: “fatal-police-shootings-data” and “fatal-police-shootings-agencies,” each serving specific analytical purposes.

The first dataset, “fatal-police-shootings-data,” encompasses 8770 rows and 19 columns, covering the time span from January 2, 2015, to October 7, 2023. It’s important to note the presence of missing values in critical columns such as threat category, flee status, and location information. Despite these data gaps, this dataset offers a wealth of information regarding fatal police shootings, including details on threat levels, types of weapons involved, demographic information, and more.

Conversely, the “fatal-police-shootings-agencies” dataset comprises 3322 rows and six columns. Similar to the first dataset, it also contains missing data points, particularly in the “oricodes” column. This dataset is designed to provide insights into law enforcement agencies, including their names, identities, types, locations, and their involvement in shooting incidents with police officers.

To make the most of these datasets, it’s crucial to consider the context and formulate specific questions aligned with the analytical objectives. Doing so will allow for a more meaningful exploration of the law enforcement organizations, the intricate relationships between variables, and the incidents of fatal police shootings. These databases offer a valuable opportunity to investigate and gain deeper insights into these topics.

9th oct

Various factors influence the significance of variables in data analysis and machine learning. The importance of a variable is closely linked to its specific role in a particular context. Some variables exert substantial influence, while others play more minor roles. Identifying the most relevant variables often requires a combination of domain knowledge and techniques like correlation analysis.

Collinearity, where variables are interrelated, can make it challenging to discern their true importance. Therefore, it’s crucial to carefully select variables to ensure a clearer interpretation of models. Exploratory data analysis is vital for gaining a deeper understanding of variable relationships and significance.

Different machine learning models either explicitly indicate feature importance or assign varying weights to them. Expertise in the relevant field can uncover critical variables that may not be immediately evident from the data alone. Managing outliers is essential to prevent them from distorting assessments of variable importance.

The way variables are processed, whether through encoding or normalization methods, can also impact their perceived significance and overall model performance. In some models, a variable’s importance may depend on its interactions with other variables. Ultimately, the most important variables are those that align with the primary goal of the model, whether it’s understanding causality or enhancing prediction accuracy.

oct 3rd

Last week, I employed the R-squared metric for cross-validation, which helps estimate how much of the variation in the dependent variable can be predicted based on the predictors. Today, I delved into analyzing my models using various scoring measures and took some time to understand their distinctions. Notably, when a specific scoring metric isn’t specified in the parameters, the cross_val_score function calculates the negative Mean Squared Error (MSE) for each fold. It’s important to note that MSE is highly sensitive to outliers.

Additionally, I familiarized myself with the Mean Absolute Error (MAE) measure, which is more appropriate when all errors should be treated with equal importance and weight. This metric provides a different perspective on model performance compared to MSE and is particularly useful in scenarios where outliers can significantly impact results.