Data from the Current Population Survey (CPS) provide detailed labor market information and demographics. The CPS data are provided for NYS. Topics include Veterans (employment status and selected demographics only available for New York State), employment status and other labor force demographics.
The Greenhouse Gas Emissions Inventory details the estimated New York State emissions releases from fuel combustion. The dataset includes a fuel type breakdown, by sector, for the current year.
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
The Highway Mileage Report is an annual publication that lists the mileage of roadway and maintenance jurisdiction/ownership of roadways within each municipality of New York State. Information is current as of the publication date.
The World Bank’s Services Trade Restrictions Database aims to facilitate dialogue about, and analysis of, services trade policies. The database provides comparable information on services trade policy measures for 103 countries, five sectors (telecommunications, finance, transportation, retail and professional services) and key modes of delivery. Compared to the vast empirical literature on policies affecting trade in goods, the empirical analysis of services trade policy is still in its infancy. One major constraint has been inadequate data on policies affecting services trade. Our limited knowledge of the pattern of services policy contrasts with the importance of services. Today, some 80 percent of GDP in the United States and the European Union originates from services, and the proportion is well over 50 percent in most countries, industrial and developing alike.
The HELP (Highway Emergency Local Patrol) file provides the number of motorists assisted by month, by region, in vehicles traveling on over 1,450 miles of limited access interstate roadways, parkways, and expressways on Long Island, in New York City, the Lower Hudson Valley, Buffalo, Syracuse, Rochester, and the Albany Capital District.
COVID Vaccination Dataset which gets updated daily.
The data is collected from OWID (Our World in Data) GitHub repository, which is updated on daily bases.
This dataset contains only one file vaccinations.csv, which contains the records of vaccination doses received by people from all the countries.
* location: name of the country (or region within a country).
* iso_code: ISO 3166-1 alpha-3 – three-letter country codes.
* date: date of the observation.
* total_vaccinations: total number of doses administered. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). If a person receives one dose of the vaccine, this metric goes up by 1. If they receive a second dose, it goes up by 1 again.
* total_vaccinations_per_hundred: total_vaccinations per 100 people in the total population of the country.
* daily_vaccinations_raw: daily change in the total number of doses administered. It is only calculated for consecutive days. This is a raw measure provided for data checks and transparency, but we strongly recommend that any analysis on daily vaccination rates be conducted using daily_vaccinations instead.
* daily_vaccinations: new doses administered per day (7-day smoothed). For countries that don't report data on a daily basis, we assume that doses changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window. An example of how we perform this calculation can be found here.
* daily_vaccinations_per_million: daily_vaccinations per 1,000,000 people in the total population of the country.
* people_vaccinated: total number of people who received at least one vaccine dose. If a person receives the first dose of a 2-dose vaccine, this metric goes up by 1. If they receive the second dose, the metric stays the same.
* people_vaccinated_per_hundred: people_vaccinated per 100 people in the total population of the country.
* people_fully_vaccinated: total number of people who received all doses prescribed by the vaccination protocol. If a person receives the first dose of a 2-dose vaccine, this metric stays the same. If they receive the second dose, the metric goes up by 1.
* people_fully_vaccinated_per_hundred: people_fully_vaccinated per 100 people in the total population of the country.
Note: for people_vaccinated and people_fully_vaccinated we are dependent on the necessary data being made available, so we may not be able to make these metrics available for some countries.
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
**Dataset Description **
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
1. Store - the store number
2. Date - the week of sales
3. Weekly_Sales - sales for the given store
4. Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
5. Temperature - Temperature on the day of sale
6. Fuel_Price - Cost of fuel in the region
7. CPI – Prevailing consumer price index
8. Unemployment - Prevailing unemployment rate
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Basic Statistics tasks
1. Which store has maximum sales
2. Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3. Which store/s has good quarterly growth rate in Q3’2012
5. Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
6. Provide a monthly and semester view of sales in units and give insights
For Store 1 – Build prediction models to forecast demand
1. Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
2. Change dates into days by creating new variable.
The DFS ranks automobile insurance companies doing business in New York State based on the number of consumer complaints upheld against them as a percentage of their total business over a two-year period. Complaints typically involve issues like delays in the payment of no-fault claims and nonrenewal of policies. Insurers with the fewest upheld complaints per million dollars of premiums appear at the top of the list. Those with the highest complaint ratios are ranked at the bottom.