Using national income and expenditure distribution data from 119 countries, the authors decompose total income inequality between the individuals in the world, by continent and by "region" (countries grouped by income level). They use a Gini decomposition that allows for an exact breakdown (without a residual term) of the overall Gini by recipients. Looking first at income inequality in income between countries is more important than inequality within countries. Africa, Latin America, and Western Europe and North America are quite homogeneous continent, with small differences between countries (so that most of the inequality on these continents is explained by inequality within countries). Next the authors divide the world into three groups: the rich G7 countries (and those with similar income levels), the less developed countries (those with per capita income less than or equal to Brazil's), and the middle-income countries (those with per capita income between Brazil's and Italy's). They find little overlap between such groups very few people in developing countries have incomes in the range of those in the rich countries.
Data for replicating The Global Spatial Distribution of Economic Activity: Nature, History, and the Role of Trade (forthcoming 2018; with Vernon Henderson, Tim Squires and David N. Weil) Quarterly Journal of Economics We explore the role of natural characteristics in determining the worldwide spatial distribution of economic activity, as proxied by lights at night, observed across 240,000 grid cells. A parsimonious set of 24 physical geography attributes explains 47% of worldwide variation and 35% of within-country variation in lights. We divide geographic characteristics into two groups, those primarily important for agriculture and those primarily important for trade, and confront a puzzle. In examining within-country variation in lights, among countries that developed early, agricultural variables incrementally explain over 6 times as much variation in lights as do trade variables, while among late developing countries the ratio is only about 1.5, even though the latter group is far more dependent on agriculture. Correspondingly, the marginal effects of agricultural variables as a group on lights are larger in absolute value, and those for trade smaller, for early developers than for late developers. We show that this apparent puzzle is explained by persistence and the differential timing of technological shocks in the two sets of countries. For early developers, structural transformation due to rising agricultural productivity began when transport costs were still high, so cities were localized in agricultural regions. When transport costs fell, these agglomerations persisted. In late-developing countries, transport costs fell before structural transformation. To exploit urban scale economies, manufacturing agglomerated in relatively few, often coastal, locations. Consistent with this explanation, countries that developed earlier are more spatially equal in their distribution of education and economic activity than late developers. This dataset is part of the Global Research Program on Spatial Development of Cities funded by the Multi-Donor Trust Fund on Sustainable Urbanization of the World Bank and supported by the U.K. Department for International Development.
This data contains medical costs of people.
**Data Description: **
The data at hand contains medical costs of people characterized by certain attributes.
Leveraging customer information is paramount for most businesses. In the case
of an insurance company, attributes of customers like the ones mentioned
below can be crucial in making business decisions. Hence, knowing to explore
and generate value out of such data can be an invaluable skill to have.
**Attribute Information: **
**age:** age of primary beneficiary
**sex: **insurance contractor gender, female, male
**bmi:** Body mass index, providing an understanding of body, weights that are
relatively high or low relative to height, objective index of body weight (kg / m ^
2) using the ratio of height to weight, ideally 18.5 to 24.9
**children:** Number of children covered by health insurance / Number of dependents
**region:** the beneficiary's residential area in the US, northeast, southeast,
**charges:** Individual medical costs billed by health insurance.
2016 US Murder Dataset
# 2016 Murder Data
This dataset contains data behind the story [A Handful Of Cities Are Driving 2016's Rise In Murder](http://fivethirtyeight.com/features/a-handful-of-cities-are-driving-2016s-rise-in-murders/).
`murder_2016_prelim.csv` contains preliminary 2016 murder counts for 79 large U.S. cities. 2015 figures are counts through the same data a year ago. Sources are listed in the file.
`murder_2015_final.csv` contains full-year 2014 and 2015 murder counts for all U.S. cities with at least 250,000 residents. Source is FBI Uniform Crime Reports.
Squad details for all 32 teams participating in the 2018 FIFA World Cup
The 2018 FIFA World Cup is an international football tournament that will be held in Russia from 14 June to 15 July 2018. The 32 national teams involved in the tournament are required to register a squad of 23 players, including three goalkeepers. Only players in these squads are eligible to take part in the tournament
The final list of 23 players per national team were submitted to FIFA by 4 June, 10 days prior to the opening match of the tournament. FIFA published the final lists with squad numbers on their website the same day. Teams are permitted to make late replacements in the event of serious injury, at any time up to 24 hours before their first game.
The position listed for each player is per the official squad list published by FIFA. The age listed for each player is calculated as of when the dataset was published. The numbers of caps and goals listed for each player do not include any matches played after the start of tournament. The club listed is the club for which the player was last eligible to play a competitive match prior to the tournament.
This data was pulled from the Wikipedia website, and can be found at this link: https://en.wikipedia.org/wiki/2018_FIFA_World_Cup_squads
Daily calorie intake and exercise information with daily weight loss/gain
Following a festive period a couple of years ago, I began a weight loss diet in early January to lose the Christmas-period pounds. Working on the assumption that the laws of thermodynamics applied to me as much as the rest of the universe, I didn't care about what I ate, as long as I consumed fewer calories than I used.
As I think that there is a lot of mis-information in the huge market that is the weight-management sector, I made sure that I ate everything I was told not to; click-bait articles such as "the six foods you MUST NOT EAT if you want to lose weight" were my bread and butter. I made sure to eat them all, and show that I could still lose weight, just by not eating too much of them.
In 2018, I decided to be a bit more formal about this, and record a daily diary of approximate calorie intake and a few things to do with what I ate and what exercise I did. The idea was to come up with a dataset for regression analysis to try to learn a bit more about the effect of calories and exercise on changes in my weight.
My goal was to get down to 11 stone and a single digit number of pounds in advance of my work Christmas party, ready to put it all on again over the festive period 18/19 and have a good set of regression coefficients to tell me how to lose it again in 2019.
Premier League Match Data as well as Financial Data of all clubs in the 18-19
Looking back at the season that was 2018-2019 and looking to delve into sight deeper insights. Using the data to see how clubs are similar stylistically, in the way they pass, attack and score goals.
This data set is wide ranging in the sense it encompass stats seen on a regular league table but goes beyond looking at how teams pass and keep possession, how they defend, tackle as well as looking at market values of a team and how much money each team was allotted from the TV rights deal.
This data was gathered from
1) BBC Sports Football,
This data was not scrapped in a conventional sense and appears in a rather haphazard manner. To counter this I included category descriptors at the start of each variable name, this should help to provide a more cohesive understanding of the data set as well as aid in sub setting.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
I've done some rather elementary data analysis and exploration. I would love to see the community wrangle with this and explore further, create more complex models, apply some ML and see what insight can be gathered from this data.