Prepared by Soňa Dulíková, Lukáš Lafférs and Miroslav Štefánik ()
Institute of Economic Research, Slovak Academy of Sciences
This work was supported by the Slovak Research and Development Agency under the contract no. APVV-17-0329.


The [Component] Projects and programmes - Projects - Education and training

This is an automated report providing evidence on [Component] Projects and programmes - Projects - Education and training, one of the active labour market policy (ALMP) programmes, implemented in Slovakia between during 2017.
The [Component] Projects and programmes - Projects - Education and training is provided based on the § 54 of the Act on Employment Services Col. 2004/5.

if(qualitative_data_condition) {
  knitr::knit_exit()
}

1. The description of the programme

Based on the Labour Market Policy Database (LMP) administrated by DG Employment of the European Commission, the [Component] Projects and programmes - Projects - Education and training is classified as an “Training” type of ALMP programme, with the programme specific code 21_SK39_1.

[Component] Projects and programmes - Projects - Education and training aims: Improvement the situation of job-seekers on labour market, Approval of new active labour market policies, Pilot projects or pilot programs oriented on support regional or local employment progress

Beneficiaries are citizens, registered job-seekers, job-seekers, employers.

The Eligible are NA

Implementation: Projects and programs financed from ESF founds, financed or co-financed from national budget or from other resources. Active labour market policy measures shall also include a) national projects approved by the Ministry and implemented by the Central Office or an office, b) projects to improve the status of jobseekers in the labour market approved by the Ministry and implemented by the Central Office, c) projects to improve the status of jobseekers in the labour market approved by the Ministry and implemented by an office, d) projects to improve the status of jobseekers in the labour market approved by the Ministry or the Central Office and implemented by an office, a legal entity or a natural person, e) pilot projects to test new active labour market policy measures approved by the Ministry and implemented by the Central Office, f) pilot projects or pilot programmes to support regional or local employment approved by the Central Office and implemented by an office. Specifications of national projects (NP) that are part of intervention measures: Through Apprenticeship to Employment Employers may receive: - a financial contribution for mentoring, to partially cover mentors labour costs, - a financial contribution to partially cover total labour costs of employees, - a financial contribution to partially cover the unavoidable costs. The aim of this project is to give NEETs the opportunity to gain or improve their knowledge, skills and practical experience through mentoring. Graduate employment practice starts employment - Contribution to a graduate to start a graduate employment practice. - Contribution to employers to partially cover the advance payments for permanent health insurance contributions, social insurance contributions and mandatory contributions to old-age pension savings paid by the employer. The aim of this measure is to increase the employment rate and employability of NEETs, as well as the inclusion of young people who are not in education, employment or training, with subsequent opportunity to create a new position for a young person who finished a graduate employment practice. Success in the Labour Market - Contribution to employers to partially cover total labour costs of employees. - Contribution to jobseekers to support job creation through self-employment. The contribution partially covers costs related to self-employment activities. The aim is to empower the NEETs and improve their situation in the labour market, as well as to increase their employability. Breaking the Vicious Circle of Unemployment - Contribution to employers to partially cover total labour costs of employees; to support training and tutoring activities, to partially cover total labour costs of tutors. The aim is to promote the employment and reduce long-term unemployment through education and training. Chance of employment - Contribution to employers to partially cover total labour costs of employees. - A single financial contribution to partially cover the unavoidable costs related to community services. The aim is to improve the situation of disadvantaged unemployed in the labour market, increase the employment rate and their employability, to reduce long-term unemployment and support local and regional employment. Active on the Labour Market - Contribution to employers to partially cover the advance payments for permanent health insurance contributions, social insurance contributions and mandatory contributions to old-age pension savings paid by the employer. The aim is to improve the situation of disadvantaged unemployed, especially people over 50 years of age; to increase the employment rate and their employability through the provision of contributions to support the job creation in less developed regions. Expenditure on the RE-PAS Project (retraining allowance) represents tuition fees, which are refunded by the Labour Office after the completion of a training course to the provider. The aim of this measure is to give jobseekers the opportunity to attend a training course to gain new skills and knowledge.


1.1 Participants and expenditures

#Preparing of participants and expenditures tables 
partSK <- subset(part, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & age == 'TOTAL' & sex == 'T' & stk_flow == 'ENT' )
names(programesSK)[1] <- 'lmp_type'
partSK <- merge(partSK, programesSK, by = 'lmp_type', all = TRUE)
partSK <- subset(partSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
partSK_datapie <- subset(partSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
partSK_datapie$value <- ifelse(is.na(partSK_datapie$value), sum(partSK[substr(partSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), partSK_datapie$value)


expSK <- subset(exp, year == format(as.Date(params$ep_start),"%Y"))
expSK <- merge(expSK, programesSK, by = 'lmp_type', all = TRUE)
expSK <- subset(expSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
expSK_datapie <- subset(expSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
expSK_datapie$value <- ifelse(is.na(expSK_datapie$value), sum(expSK[substr(expSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), expSK_datapie$value)

#Share of expenditures and participants at programm

partSK_per <- partSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

expSK_per <- expSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

program_part <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            participants = c(sum(partSK_per$value), subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1] / sum(partSK_per$value) * 100,2),'%'))
                            )

program_exp <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            expenditures = c(sum(expSK_per$value), subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1] / sum(expSK_per$value) * 100,2),'%'))
                            )


type_share_par <- partSK_per$per[partSK_per$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$Classification[1])] 
prog_share_par <- program_part$per[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_par <- program_part$participants[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

type_share_exp <- expSK_per$per[expSK_per$lmp_type == subset(qualitative, qualitative$almp == params$measure)$Classification[1]]
prog_share_exp <- program_exp$per[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_exp <- program_exp$expenditures[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

The left pie chart Participants displays the share of participants in ALMP programmes grouped by ALMP types of the LMP classification. Shares are based on LMP Database stock figures reported for the calendar year 2017.

The right pie chart Expenditure display the share of expenditure using the same LMP typology as the Participants pie chart. Comparing the share on the total pie of participants to the share on the total pie of expenditures provides an assessment of unit costs of the programme relative to the average. If the percentage of participants is higher than on total spending, then the programme is less expensive than the average ALMP programme (and vice versa).

Graph 1: Resources flowing to [Component] Projects and programmes - Projects - Education and training during 2017

#Preparing data for Pie Chart
#PARTICIPANTS
piechart_P <- select(partSK_datapie, lmp_type, value)

piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == '2' | piechart_P$lmp_type == '4' | piechart_P$lmp_type == '5' | piechart_P$lmp_type == '6' | piechart_P$lmp_type == '7',       
                              qualitative$class[match(piechart_P$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_P <- filter(piechart_P, !duplicated(piechart_P$lmp_type))
piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_P$lmp_type)

piechart_P <- piechart_P[order(piechart_P$lmp_type),]
piechart_P$focus <- ifelse(
  piechart_P$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)

#EXPENDITURES 
piechart_E <- select(expSK_datapie, lmp_type, value)

piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == '2' | piechart_E$lmp_type == '4' | piechart_E$lmp_type == '5' | piechart_E$lmp_type == '6' | piechart_E$lmp_type == '7',       
                              qualitative$class[match(piechart_E$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_E <- filter(piechart_E, !duplicated(piechart_E$lmp_type))
piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_E$lmp_type)

piechart_E <- piechart_E[order(piechart_E$lmp_type),]
piechart_E$focus <- ifelse(
  piechart_E$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)


## PIE CHART
#PARTICIPANTS
piechart_P$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_P$lmp_type)
ev_measure <- gsub("(.{25,}?)\\s", "\\1\n",
                   paste(subset(qualitative, qualitative$almp == params$measure)$class[1],
                         subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
                         sep =': '))

ev_type <- gsub("(.{25,}?)\\s", "\\1\n",
                subset(qualitative, qualitative$almp == params$measure)$class[1])

piechart_P$value <- ifelse(piechart_P$lmp_type == ev_type, piechart_P$value - piechart_P[piechart_P$lmp_type == ev_measure,"value"], piechart_P$value)

piechart_P <- piechart_P %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_P <- ggplot(piechart_P) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_P, stat = 'pie',color='white')+
  ggtitle('Participants') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_P$value != 0 &  piechart_P$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_P$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

#EXPENDITURES
piechart_E$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_E$lmp_type)

piechart_E$value <- ifelse(piechart_E$lmp_type == ev_type, piechart_E$value - piechart_E[piechart_E$lmp_type == ev_measure,"value"], piechart_E$value)

piechart_E <- piechart_E %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_E <- ggplot(piechart_E) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_E, stat = 'pie',color='white')+
  ggtitle('Expenditure') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_E$value != 0 &  piechart_E$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_E$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))
## Warning: Removed 1 rows containing non-finite values (stat_pie).
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: Removed 6 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing non-finite values (stat_pie).
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: Removed 6 rows containing missing values (geom_text).

Based on the LMP database, 18 227 individuals participated in the [Component] Projects and programmes - Projects - Education and training during 2017. This accounted for the 9.44 % percent of the total number of participants in all Slovak ALMP programmes (LMP types 2-7), while the share of the programme on the total expenditure was 5.42 %. The total roofing LMP type Training presents 6 % percent of the total Slovak ALMP (type 2-7) expenditure and 9.57 % percent of the total ALMP participants.


1.2 The [Component] Projects and programmes - Projects - Education and training in the context of ALMPs in Slovakia

Using administrative data, we first provide a picture of the importance of the [Component] Projects and programmes - Projects - Education and training in the context of Slovak ALMP. The following flowchart displays flow of job seekers registered into the database of unemployment during . Flows are based on the movement of these individuals during the two years following their registration; this period is divided into six months sub-periods (0/6/12/18/24). For each of these sub-periods, we observe the flows of registered jobseekers to employment, or their de-registration for another reason.
Job seekers can also move to one of the ALMP programmes. The lines highlighted in red represent the flow of jobseekers participating in measure P54K, [Component] Projects and programmes - Projects - Education and training.

Graph 2: The [Component] Projects and programmes - Projects - Education and training in the structure of the flows of jobseekers registered during 2017

The next table shows full names of programmes shown in the graph above.

Table 1: Explanatory table for Graph 2

Sankey_description <- select(qualitative, almp, Labour.market.services, Labour.market.services_SK)
Sankey_description[32,1] <- 'P054'
Sankey_description<- rbind(Sankey_description, Sankey_description[32,])
Sankey_description[nrow(Sankey_description),1] <- 'P54O'
Sankey_description[28,1] <- 'P54D'

Sankey_tabel1 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
Sankey_tabel1$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel1$Measures, Sankey_description$almp)]

Sankey_tabel1  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
  kable_classic('hover', full_width = FALSE)%>%
  column_spec(1,  border_right = TRUE) 
Measures Name of programme
P54R [Component] Projects and programmes - Projects - Education and training Repas
P053 Commuting allowance
P054 Projects and programmes
P051 Support for graduate work experience
P54O [Component] Projects and programmes - Self-employment
P060 Reimbursement of operating costs of sheltered workshops and workplaces
P052 Work in minor services for municipalities or self-governing regions
P52A Voluntary work
P049 Self-employment grant
P50J Support for the development of local and regional employment
P54K [Component] Projects and programmes - Projects - Education and training
## Preparing of df
# we observe only  young people under XX (age) which inflow in XXXX (entry year) 
df_r <- subset(df, age <= age_group_max)
df_r <- subset(df_r, format(as.Date(df_r$entry),"%Y")== format(as.Date(params$ep_start),"%Y"))
#df_r <- df_r %>% mutate_all(na_if,"") #if cells are empty -> change it to NAs
df_r <- subset(df_r, healthy < 3)

# DOVOD VYRADENIA 
dovod_vyradenia = c('V01','V02','V03','V1','V12','V15') #zamestnali sa 

# Opatrenie P032 vyhodiť (ak sa budú meniť aj iné opatrenia ako napr. 54R a 54Rp tak tu sa to opraví (%in% c()))
delete <- setdiff(c('P032','P54P','P54D'),params$measure)
Salmps <- subset(almps, !nastroj  %in% delete)

## Spojenie DF a ALMPS 
df_almps <- merge(Salmps, df_r, by = 'klient_id')

## Vyfiltruj iba tie klient_id. pri ktorých nástroj je params$measure
partic_measure <- df_almps[df_almps$nastroj == params$measure, 'klient_id']
df_almps <- filter(df_almps, klient_id %in% partic_measure) #tu su tí, ktorí boli na aj na params$measure ale aj na iných opatreniach 

# Podmienka prieniku času pri databáze nezamestnaných a v zúčastnili sa evaluated measure
df_almps_measure <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj == params$measure) # tu su tí, ktorí išli v skúmanom čase do opatrenia params$measure 
##### toto je moja základňa môj prvý stĺpec 

##### Podmienka prieniku času pri databáze nezamestnaných a iných programoch ako evaluated measue ale zároveň sú to tí, ktorí už niekedy na evaluated measure už boli 
df_almps_other <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj != params$measure)

# upravíme si dáta ktoré budeme používať pri grafe
df_almps_measure <- select(df_almps_measure, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)
df_almps_other <- select(df_almps_other, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)


#smojím iba tých ktorí boli aj v evaluated measure aj v iných opatreniach, all.x = TRUE lebo chceme aj tých ktorý boli iba na evaluated measure (nemuseli sa zúčastniť aj iných programov)
flow <- merge(df_almps_measure, df_almps_other, by = 'klient_id', all.x = TRUE) 

#dni od začatia opatrenia entrya.x po začatie iného opatrenia entrya.y alebo po zamestnanie/odchod z iného dôvodu exit.x
flow$days <- ifelse(is.na(flow$nastroj.y),
                    as.numeric(difftime(flow$exit.x, flow$entrya.x, units = 'days')),
                    as.numeric(difftime(flow$entrya.y, flow$entrya.x, units = 'days')))

#ak je dovod vyradenia NA ale zúčastnili sa na opatrení 
flow$dovod_vyradenia_kod.y <- ifelse(is.na(flow$dovod_vyradenia_kod.y) & !is.na(flow$nastroj.y), 'V01', flow$dovod_vyradenia_kod.y)

#vyfiltruj tých ktorý boli aj na evaluated measure aj na inom opatrení alebo sa zamestnali a days nie je záporné 
#podmienka -> entrya do ďalšieho projektu musí byť väčšie ako entrya do evaluated measure
#flow$days nemôže byť záporné 
flow <- filter(flow, days >= 0)

#nastroj -> ak dovod vyradenia sa rovna nejakému prvku z vektoru dovodov vyradenia tak -> employed inak another reason
flow  <- flow %>% 
  mutate(nastroj.y = case_when(is.na(flow$nastroj.y) & flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'employed',
                               is.na(flow$nastroj.y) & !flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'another reason',
                               !is.na(flow$nastroj.y) ~ flow$nastroj.y)
  )


#dataframe, ktorý budem používať pri tvorbe grafu
Sankey_measure <- flow %>% select(nastroj.x, nastroj.y, days) %>% 
           mutate(month = ceiling(days/30.417))

Sankey_measure <- Sankey_measure %>%mutate(
  time = case_when(
    Sankey_measure$month %in% seq(0,6,1)  ~ 6,
    Sankey_measure$month %in% seq(7,12,1)  ~ 12, 
    Sankey_measure$month %in% seq(13,18,1)  ~ 18,
    Sankey_measure$month %in% seq(19,100,1)  ~ 24,
  ) 
)

Sankey_measure$sources <- ifelse(Sankey_measure$time == 6 | Sankey_measure$time == 12 |
                                   Sankey_measure$time == 18 | Sankey_measure$time == 24, 
                                 Sankey_measure$time - 6, 
                                 Sankey_measure$time)


#zosumarizuj, koľký mladí išli do ktorého opatrenia, zamestnali sa alebo odišli z registra z iných dôvodov
San_measure <- Sankey_measure %>% select(nastroj.y, time, sources) %>%
  group_by(nastroj.y, time, sources) %>% summarise(num = n(), .groups = 'drop') %>%
  rename(nastroj = nastroj.y)


# rozdeľ opatrenia, na tie ostatné almps - OTHER ALMPS 
nastroj_kod <- c('another reason','employed')
NastrojKod <- San_measure[!San_measure$nastroj %in% nastroj_kod,]  %>%  group_by(nastroj) %>% summarise(num = sum(num), .groups = 'drop') %>%
  mutate(perc = num*100 / sum(num),
         cut = case_when(perc >= 5 ~ 1,
                         perc < 5 ~ 0))
nastroj_kod <- c(nastroj_kod, NastrojKod$nastroj[NastrojKod$cut == '1'])

San_aplmps <- subset(San_measure, nastroj %in% nastroj_kod | nastroj == params$measure)
San_other_aplmps <- subset(San_measure, !nastroj %in% nastroj_kod & !nastroj == params$measure)

San_other_aplmps <- San_other_aplmps %>%  group_by(sources , time) %>% summarise(num = sum(num), .groups = 'drop') 
San_other_aplmps$nastroj <- 'OTHER ALMPS'
San_other_aplmps <- relocate(San_other_aplmps, c(nastroj, time), .before = sources,)

San_measure <- rbind(San_aplmps, San_other_aplmps)
remove(San_aplmps, San_other_aplmps)


# uzly grafu (jedinečné), musia tu byť všetky opatrenia
node_m <- data.frame(
  name=c(as.character(San_measure$nastroj), as.character(San_measure$sources))%>% unique()
)

# definovanie koľko registrovaných bude medzi tými rokmi  
velky_df <- data.frame()
for (i in seq(6,24,6)){
  pocet <- San_measure %>%  group_by('sources' = sources >= i) %>% summarise(num = sum(num), .groups = 'drop') 
  pocet <- subset(pocet, sources == TRUE)
  pocet$sources <- i
  velky_df <- rbind(velky_df, pocet)
}

# musím si velky_df prisposobiť tak, aby roky boli ako nodes aby som to mohla spojiť s dataframe San s ktorým potom budem ďalej robiť graf
# preto sources budu ako nastroj -> aby som spravila nodes, years su sources ale sources su years -1 v skutočnosti (v san grafe)
colnames(velky_df) <- c('nastroj', 'num')
velky_df$time <- velky_df$nastroj
velky_df$sources <- San_measure$sources[match(velky_df$time, San_measure$time)] 
velky_df <- relocate(velky_df, num, .after = sources)

San_measure <- rbind(San_measure, velky_df)

#urobím IDsources a ID target podľa uzlov aby garf vedel ten flow medzi jednotlivími uzlami 
San_measure$IDsource <- match(San_measure$sources, node_m$name)-1 
San_measure$IDtarget <- match(San_measure$nastroj, node_m$name)-1

#Color 
time <- seq(0,24,6)
NOALMP <- c('another reason', 'employed')
node_m <- node_m %>% mutate(group = case_when(node_m$name %in% NOALMP ~ 'A',
                                              !node_m$name %in% NOALMP & !node_m$name %in% time ~ 'B',
                                              node_m$name %in% time ~ 'C'
)
)

San_measure$group <- 'type_a'

my_color <- 'd3.scaleOrdinal() .domain(["type_a", "A","B", "C"]) .range(["lightgray", "darkseagreen", "thistle", "rosybrown", "red"])'

San_measure <- as.data.frame(San_measure)

Sankey_ev.measure <- sankeyNetwork(Links = San_measure, Nodes = node_m,
                                   Source = "IDsource", Target = "IDtarget",
                                   Value = "num", NodeID = "name", 
                                   sinksRight=F, fontSize = 14,
                                   fontFamily = "sans-serif",
                                   width = 900,
                                   colourScale=my_color, LinkGroup="group", NodeGroup="group",
                                   nodePadding=10)

condition_for_Sankeyplot <- length(unique(San_measure$IDsource))>=1 && length(unique(San_measure$IDtarget))>=1 && !any(is.na(San_measure$IDsource)) && !any(is.na(San_measure$IDtarget))

The second flow chart shows the further state of the participants of the evaluated programme (further branching of the red line in graph 2) and what happened with them after participation in the evaluated programme. After participation in the programme, the participants can be employed, de-registered for another reason, or they can participate in another ALMP programme. We observed this behaviour for periods of two years after their participation. This period is divided into six months sub-periods (0/6/12/18).

Graph 2.2: Flowchart of participants in evaluation progname participated during evaluation period.

if(condition_for_Sankeyplot){
  Sankey_ev.measure
}
cond_for_table <- FALSE
if(condition_for_Sankeyplot){
  Sankey_tabel2 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
  Sankey_tabel2$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel2$Measures, Sankey_description$almp)]
  
  cond_for_table <- nrow(Sankey_tabel2) >=1
}

The next table shows full names of programmes shown in graph above.

Table 1.2: Explanatory table for Graph 2.2

if (cond_for_table){
  Sankey_tabel2  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
    kable_classic('hover', full_width = FALSE)%>%
    column_spec(1,  border_right = TRUE)
}
Measures Name of programme
P051 Support for graduate work experience
P052 Work in minor services for municipalities or self-governing regions
P54O [Component] Projects and programmes - Self-employment
P54R [Component] Projects and programmes - Projects - Education and training Repas



2. Data and the evaluation sample

This evaluation report is based on administrative data on registered unemployed jobseekers in Slovakia, inter-linkable with the database of participants in ALMP programmes. The export was provided by the Central Office of Labour, Social Affairs and Family of the Slovak Republic (COLSAF) at the beginning of 2021. It covered a period from January 2014 until December 2020. The raw data were processed using a data-preparation script, available on request from: .
The data frame “df” covers all unemployment spells of unemployed jobseekers, with attributes collected at the moment of their registration as unemployed jobseekers (Application Form).

### DEFINE THE EVALUATION PERIOD #
ep_start <- as.Date(params$ep_start)
ep_end <- as.Date(params$ep_end)
un_spell <- spell
measure <- params$measure

    ########################################x
    ## SELECTING THE EVALUATION SAMPLE #
    ########################################x,
    
treated<-filter(almps, nastroj==toString(params$measure))

#Sub-groups to be dropped: 
# - those with ALMP participation 2 years before the EP
IDalmps_before<-unique(almps$klient_id[almps$entrya<ep_start & almps$entrya>=ep_start-730]) 
# - those with ALMP participation in other ALMP during the EP
IDalmps_during_ep<-unique(almps$klient_id[(almps$entrya<=ep_end & almps$entrya>=ep_start) & almps$nastroj!=toString(params$measure)])

###DEFINE THE ELIGIBILITY CRITERIA 
#the EC are measure specific, in the case of looping over multiple measures EC need to be elaborated t a form of table or a list and added to the parameters
#SUBSETTING THE BASE EVALUATION DATASET OF ELIGIBLE 
cond0<-as.logical(df$entry<=ep_end & df$exit>=ep_start) # Being on the register of unemployed during the evaluation period
cond1<-as.logical(df$age < age_group_max) 
cond2<-as.logical((df$exit-df$entry)>=un_spell) #LENGTH OF PREVIOUS UNEMPLOYMENT SPELL
cond3<-as.logical(df$entry>=ep_start-730) # Dropping old unemployment spells (cases inflowing more than 730 days before the start of the evaluation period)

dfe<-df[cond0 & cond1 & cond2 & cond3,]
n1 <- dim(dfe)[1]
sampleIDs<-unique(df$klient_id[cond0 & cond1 & cond2 & cond3])
n2 <- length(sampleIDs)

###
#### ONLY KEEP THE SPELLS OF PARTICIPANTS DURING WHICH THEY PARTICIPATED 
#### Creating dataframe of participants in the evaluated programme during the evaluation period.
dfa<-filter(treated, entrya<=ep_end & entrya>=ep_start)
npart0<-length(unique(dfa$klient_id))
npart1<-dim(dfa)[1]

#Drop other ALMP participations from the group of participants as well as the eligible non-participants
n3 <- nrow(filter(dfa, klient_id %in% IDalmps_before))
n4 <- nrow(filter(dfe, klient_id %in% IDalmps_before))
dfa<-filter(dfa, !klient_id %in% IDalmps_before)
dfe<-filter(dfe, !klient_id %in% IDalmps_before)

n5 <- nrow(filter(dfa, klient_id %in% IDalmps_during_ep))
n6 <- nrow(filter(dfe, klient_id %in% IDalmps_during_ep))
dfa<-filter(dfa, !klient_id %in% IDalmps_during_ep)
dfe<-filter(dfe, !klient_id %in% IDalmps_during_ep)



#### Only participants with one participation during the evaluation period are sampled. 
#### JS with multiple participations are droped from the sample
dfa<-dfa %>%
  group_by(klient_id) %>% 
  mutate(rep=n()) # rep is the number of participations of one JS repeating during 2014

n7 <- nrow(filter(dfa, rep!=1))
dfa<-filter(dfa, rep==1)
npart2<-dim(dfa)[1] # Number of participants after cleaning with multiple ALMP participations

###Participants who also participated in other ALMP measures (§54) are dropped
progOUT<- setdiff(c("P050", "P50A", "P50C","P50J", "P50K" ,"P51A", "P054", "P54D", "P54E", "P54O", "P54P", "P54U"),params$measure)
outIDs<-unique(almps$klient_id[(almps$entrya>=as.Date(params$ep_start) & almps$entrya<=as.Date(params$ep_end)+730) & as.logical(almps$nastroj %in% progOUT)])

n8 <- nrow(filter(dfa, klient_id %in% outIDs))
n9 <- nrow(filter(dfe, klient_id %in% outIDs))

dfa<-(filter(dfa, !klient_id %in% outIDs))
dfe<-(filter(dfe, !klient_id %in% outIDs))

npart3<-length(unique(dfa$klient_id)) # The number of participants after we drop participations in supported employment during the outcome observation period 

#MERGING PARTICIPATIONS AND UNEMPLOYMENT SPELLS
#First we add the date of the entry and exit from the registration into the table of participations in measure (evaluated measure params$measure). We only import entry dates for the individuals in the evaluation sample. 
dfa<-merge(dfa, select(dfe, klient_id, entry, exit), by="klient_id", all.x = TRUE)
nrowdfa <- nrow(dfa)

#Second we filter only the registrations of members of the evaluation sample during which the programme participation took place. 
dfa<-filter(dfa, dfa$exit+30>=dfa$entrya & dfa$entrya<=ep_end & entrya >= entry) # Keeping only the participations happening during an unemployment spell
n10 <- nrowdfa-dim(dfa)[1]
npart4<-dim(dfa)[1] # Number of participants after cleaning participations outside an unemployment spell (data quality issue)

## Participants #
particIDs<-unique(dfa$klient_id)
## Eligible #
# nonpartIDs<-sampleIDs[!(sampleIDs %in% particIDs)]
nonpart<-filter(dfe, !(klient_id %in% particIDs))
###Out of the participants only one-time participations happening during an unemployment spell are used 
partic<-merge(dfe, dfa, by = c("klient_id", "entry"), all.x = FALSE)

#LL: sanity check
#nonpart$klient_id
#partic$klient_id
#intersect(nonpart$klient_id,partic$klient_id)
#this should be empty. OK

### Cleaning and renaming #
partic$exit.y<-NULL
partic<-partic %>% rename(exit=exit.x)

nonpart$entrya<-NA
nonpart$exita<-NA
nonpart$nastroj<-NA
nonpart$naklady<-NA
nonpart$projekt<-NA
partic$rep<-NULL

#Filter extreme values (1%) of the waiting time until participation in the evaluated measure 
wte<-quantile(as.numeric(partic$entrya)-as.numeric(partic$entry),na.rm=TRUE, probs=0.99)
n11 <- nrow(filter(filter(partic, as.numeric(entrya)-as.numeric(entry)>wte)))
partic<-filter(partic, as.numeric(entrya)-as.numeric(entry)<=wte)

esample<-rbind(nonpart, partic)
esample$treated<-!is.na(esample$entrya)
#Filter extreme values of the waiting time until participation in the evaluated measure 

#LL:
#summary(esample$treated)

###Unemployment spells ending with LM placements
#esample<-filter(esample, dovod_vyradenia_kod == 'V01' | dovod_vyradenia_kod == 'V02' | dovod_vyradenia_kod == 'V03' | dovod_vyradenia_kod == 'V1' | dovod_vyradenia_kod == 'V12' | dovod_vyradenia_kod == 'V15')

    ########################################x
    ## GENERATING EXPLANATORY VARIABLES #
    ########################################x,
    esample$ent <- as.numeric(as.Date(ep_start))-as.numeric(as.Date(esample$entry))
    
    #LL: quantile(esample$ent)
    #niektore unemployment spells boli 
    # negativne (JS zacal byt nezamestnany po 1-1-2017), 
    # niektore pozitivne (JS zacal byt nezamestnany pred 1-1-2017)

    ########AGEG
    esample$ageg <- cut_interval(esample$age, 5, labels=FALSE)
    esample$ageg <- as.factor(esample$ageg)
    
    ####Extra columns for dummy variables go into the esample_est for further testing
    esample <- dummy_cols(esample, select_columns = c("ageg"), remove_first_dummy = TRUE)
    ageg_dummies<-colnames(esample)[grepl("ageg_", colnames(esample))]

  #######Regional Unemployment rate during the implementation period
    esample[, "UR_region"] <- esample[, paste0("UR_region_",year(ep_start), "")]
    esample[,grepl("UR_region_", colnames(esample))]<-NULL
    #esample$UR_region <- as.numeric(gsub(",", ".", gsub("\\.", "", esample$UR_region)))
    
    ##########Difference between entry into unemployment register and started of participation in measure
    esample$diff_entry <- ceiling(as.integer(as.Date(esample$entrya) - as.Date(esample$entry))/30.417)

#Cleaning
#nonpart<-NULL
#partic<-NULL




# Share of repeated unemployment after participation in ALMP
# Merge esample, history table
h_esample <- merge(select(esample, klient_id, entry, exit, dovod_vyradenia_kod, entrya, exita, nastroj, treated), dfh, by = 'klient_id', all.x = TRUE, all.y = FALSE)

# as.Date
entry <- paste('entry', seq(1:11), sep = '')
exit <- paste('exit', seq(1:11), sep = '')

for (e in entry){
  h_esample[ ,e] <- as.Date(h_esample[ ,e], origin = '1970-01-01')
}
for (x in exit){
  h_esample[ ,x] <- as.Date(h_esample[ ,x], origin = '1970-01-01')
}
h_esample$exita <- as.Date(h_esample$exita, origin = '1970-01-01')

#Dropping cases with over than 15 unemployment spells 
#h_esample<-filter(h_esample, is.na(entry16)) #subset (klient_id) jobseekers who became unemployment only 15 times 
#esample <- subset(esample, klient_id %in% h_esample$klient_id) 
#remove entry16+ and exit16+ columns
#h_esample<-select(h_esample, -entry16, -entry17, -entry18, -entry19, -entry20, -exit16, -exit17, -exit18, -exit19, -exit20)

esample_condition <- (mean(esample$treated) < 0.00001)
if(esample_condition) {
  knitr::knit_exit()
}

2.1 Description of participants and eligible

Our evaluation sample consists of 395 328 eligible individuals, registered as jobseekers during the evaluation period starting from 2017-01-01 until 2017-12-31, These jobseekers have generated in total 443 969 unemployment registrations during the period 2014-2020. Out of them, in 2 144 jobseekers participated in the evaluated measure [Component] Projects and programmes - Projects - Education and training during the evaluation period. 1 439 participants and 174 049 eligible JS were dropped from the sample, because of multiple participations in the evaluated program (or other relevant ALMPs) during and after the evaluation period. After cleaning we have sampled 705 JS with one-time participation during the evaluation period. At the same time, there was 269 920 eligible non-participants present in the register of unemployed jobseekers during 2017.

The group of participants and eligible show differences in a number of observed characteristics. Table 2 displays an overview of these differences on mean values of selected characteristics.


Table 2: Descriptive statistics of participants and eligible (selected characteristics)

####
## number of participants and eligible 
####

#separate table dfe with participants in ALMP  and eligible 
elig <- distinct_at(nonpart,vars(klient_id),.keep_all = TRUE)
part <- distinct_at(partic,vars(klient_id),.keep_all = TRUE)

#BASIC DESCRIPTIVE TABLE

# The number of participants and eligible in the sample'
#tab1<-cbind(sum(!is.na(dfe$entrya)),sum(is.na(dfe$entrya)))
tab1 <- cbind(format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
colnames(tab1) <- c('Participants', 'Eligible')
tab1 <- data.frame(cbind(Description = 'Number of observations', tab1))

####
## Age distribution
####

age_par <- part %>% select(age) %>% group_by(age) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

age_elig <- elig %>% select(age) %>% group_by(age) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

age <- merge(age_par, age_elig, by='age', all = TRUE)
age$Participants_percent <- ifelse(is.na(age$Participants_percent), paste(0,'%'), age$Participants_percent)
age$Participants_total <- ifelse(is.na(age$Participants_total), 0, age$Participants_total)

mean_age_par <- round(mean(part$age),1)
mean_age_elig <- round(mean(elig$age),1)

##### MEAN 
mean_age<-data.frame(mean_age_par,mean_age_elig)
mean_age <- cbind(Description = 'Age (years)', mean_age)
mean_age <- rename(mean_age, Participants = mean_age_par, Eligible=mean_age_elig)
####

age_elig$desc <- 'Eligible'
age_par$desc <- 'Participants'
age_elig <- age_elig %>% rename(total = Eligible_total, percent = Eligible_percent)
age_par <- age_par %>% rename(total = Participants_total, percent = Participants_percent)
age_r <- rbind(age_par,age_elig)

age_plot <- ggplot(age_r, aes(x = age, y = total, group= desc)) +
  geom_point(aes(color = desc), size = 1.5)+
  geom_line(aes(color = desc), size = 1) + 
  ylim(0,23000) +
  theme_light() +
  geom_text(aes(label = paste(percent, '\n\n' )), col ='black', size = 3, fontface ='italic')+
  labs(
    title = "Compare of age distribution (%)",
    x = "Age",
    y = "Total Count"
  )

####
## Gender distribution
####

gender_par <- part %>% select(male) %>% group_by(male) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

gender_elig <- elig %>% select(male) %>% group_by(male) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

gender <- merge(gender_par, gender_elig, by='male', all = TRUE)

####
male <- data.frame(gender_par[2,3], gender_elig[2,3])
male <- cbind(Description = 'Male', male)
male <- rename(male, Participants = Participants_percent, Eligible=Eligible_percent)


####
##  Education distribution
####

education_par <- part %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Participants_total = n(), .groups = 'drop')  %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2))
education_par <- education_par[!(is.na(education_par$noedu)),]
education_par <- reshape2::melt(education_par, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_par <- education_par[education_par$value == 1,] 
education_par <- select(education_par, -value)

education_elig <- elig %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Eligible_total = n(), .groups = 'drop')  %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2))
education_elig <- education_elig[!(is.na(education_elig$noedu)),]
education_elig <- reshape2::melt(education_elig, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_elig <- education_elig[education_elig$value == 1,] 
education_elig <- select(education_elig, -value)

education <- merge(education_par, education_elig, by='variable', all = TRUE)
education <- select(education, variable, Participants, Eligible)
education <- rename(education, Description = variable)

education <- education%>%mutate(
  Description = case_when(
    education$Description ==  'noedu' ~ 'No education',    
    education$Description ==  'primary' ~ 'Primary', 
    education$Description ==  'lsec' ~ 'Lower secondary',
    education$Description ==  'usec' ~ 'Upper secondary',
    education$Description ==   'tertiary' ~ 'Tertiary', 
    TRUE~as.character(education$Description)
  ) 
)

education <- education %>% group_by(Description) %>%
  dplyr::summarise(Participants = paste0(sum(Participants),  "%"), Eligible = paste0(sum(Eligible),  "%")) 

x <- c('No education','Primary','Lower secondary', 'Upper secondary', 'Tertiary')
education <- education[match(x, education$Description),]

####
##  skills
####

l_skills_par <- part %>%  select(flang) %>%  
  mutate(flang = case_when(part$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

l_skills_elig <- elig %>%  select(flang) %>%  
  mutate(flang = case_when(elig$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

PC_skills_par <- part %>%  select(pc) %>%  
  mutate(pc = case_when(part$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

PC_skills_elig <- elig %>%  select(pc) %>%  
  mutate(pc = case_when(elig$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

d_skills_par <- part %>%  select(drive) %>%  
  mutate(drive = case_when(part$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)

d_skills_elig <- elig %>%  select(drive) %>%  
  mutate(drive = case_when(elig$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)


l_skills <- merge(l_skills_par, l_skills_elig, by='Description', all = TRUE)
PC_skills <- merge(PC_skills_par, PC_skills_elig, by='Description', all = TRUE)
d_skills <- merge(d_skills_par, d_skills_elig, by='Description', all = TRUE)

skills <- rbind(l_skills, PC_skills, d_skills)
skills <- select(skills, Description, Participants_percent, Eligible_percent)
skills <- rename(skills, Participants = Participants_percent, Eligible=Eligible_percent)

####
##  region
####

part <- part %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

elig <- elig %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

okres_par <- part %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

okres_elig <- elig %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

okres <- merge(okres_par, okres_elig, by='okres', all = TRUE)
okres <- select(okres, okres, Participants, Eligible)
okres <- rename(okres, Description = okres)
okres <- okres[okres$Description != 'N/A',]

####
##  Previous employment
####

prev_emp_part <- part  %>% select(empl) %>% group_by(empl) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = empl, Participants = Participants_percent)  

prev_emp_elig <- elig  %>% select(empl) %>% group_by(empl) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = empl, Eligible = Eligible_percent)

prev_emp <- merge(prev_emp_part, prev_emp_elig, by='Description', all = TRUE)
prev_emp <- select(prev_emp, Description, Participants, Eligible)
prev_emp <- prev_emp[prev_emp$Description == 1,] 
prev_emp$Description[prev_emp$Description == 1 ] <- 'Previous employment'

####
##  Nationality
####

nat_part <- part  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Participants_total = n(), .groups = 'drop') %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%')) 
nat_part$othern <- ifelse(nat_part$othern == 'TRUE', 1,0)
nat_part <- reshape2::melt(nat_part, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_part <- nat_part[nat_part$value == 1,] 
nat_part <- select(nat_part, -value)

nat_elig <- elig  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Eligible_total = n(), .groups = 'drop') %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2),'%')) 
nat_elig$othern <- ifelse(nat_elig$othern == 'TRUE', 1,0)
nat_elig <- reshape2::melt(nat_elig, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_elig <- nat_elig[nat_elig$value == 1,] 
nat_elig <- select(nat_elig, -value)

nat <-  merge(nat_part, nat_elig, by='variable', all = TRUE)
nat <- select(nat, variable, Participants, Eligible)
nat <- rename(nat, Description = variable)

nat <- nat %>%mutate(
  Description = case_when(
    nat$Description == 'slovak' ~ 'Slovak', 
    nat$Description == 'hungarian' ~ 'Hungarian', 
    nat$Description == 'czech'~ 'Czech', 
    nat$Description == 'roma' ~ 'Roma', 
    nat$Description == 'othern'~ 'Other', 
  ) 
)


x <- c('Slovak','Hungarian', 'Czech','Roma','Other')
nat <- nat[match(x, nat$Description),]

####
##  Length of the unemployment spell
####

part$un_spell <- as.integer(part$exit - part$entry)
elig$un_spell <- as.integer(elig$exit - elig$entry)

un_spell <- cbind(round(mean(part$un_spell),2), round(mean(elig$un_spell),2))
un_spell <- data.frame(cbind(Description = 'Length of the unemployment spell', un_spell))
un_spell <- rename(un_spell, Participants = V2, Eligible=V3)


####
##  Length of spell between unemployment and participation
####

#Rozdiel medzi evidenciou nezamestnanosti a nastúpenia do AOTP 
part$spell_b <- as.integer(part$entrya - part$entry)

spell_bup <- part %>% select(spell_b) %>% 
  mutate(month = ceiling(spell_b/30.417)) #roundup -> ceiling

spell_bup_p <- ggplot(data=spell_bup, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none")+
  labs(
    title = "Inflow into the programme in months\nsince the start of the unemployment",
    x = "Months",
    y = "Total Count"
  )  + 
  scale_x_continuous()

####
##  Length of AOTP
####

part$spell_aotp <- as.integer(part$exita - part$entrya)

spell_aotp <- part %>% select(spell_aotp) %>% 
  mutate(month = ceiling(spell_aotp/30.417)) #roundup -> ceiling

spell_aotp_p <- ggplot(data=spell_aotp, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none") +
  labs(
    title = "Length of participation\n (in months)",
    x = "Months",
    y = "Total Count"
  ) + 
  scale_x_continuous(breaks = scales::breaks_extended(length(unique(spell_aotp$month))))


####
##  Compare of length spell
####

spell_p <- ggarrange(spell_bup_p, spell_aotp_p)


####
##  In flow
####

a <- format(seq(as.Date(ep_start),length=3,by="1 month"),"%Y-%m")
b <- format(seq((ymd(as.Date(ep_start)) %m+% months(3)),length=3,by="1 month"),"%Y-%m")
c <- format(seq((ymd(as.Date(ep_start)) %m+% months(6)),length=3,by="1 month"),"%Y-%m")
d <- format(seq((ymd(as.Date(ep_start)) %m+% months(9)),length=3,by="1 month"),"%Y-%m")

#vstúpili do programu
in_part <- part %>% select(entrya) %>% group_by(format(as.Date(entrya),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(in_part)[1] <- 'Description'
in_part <- filter(in_part, str_detect(in_part$Description, (format(as.Date(ep_start),"%Y"))))

in_part <- in_part%>%mutate(
  Description = case_when(
    in_part$Description %in% a ~  paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% b ~  paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% c ~  paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% d ~  paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_part <- in_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#sa stali nezamestnaný 
in_elig <- elig %>% select(entry) %>% group_by(format(as.Date(entry),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(in_elig)[1] <- 'Description'
in_elig <- filter(in_elig, str_detect(in_elig$Description, (format(as.Date(ep_start),"%Y"))))

in_elig <- in_elig%>%mutate(
  Description = case_when(
    in_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_elig <- in_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

inflow <- merge(in_part, in_elig, by='Description', all = FALSE)
inflow <- select(inflow, Description, Participants, Eligible)


####
##  Outflow
####

#vystúpili z programu

out_part <- part %>% select(exita) %>% group_by(format(as.Date(exita),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(out_part)[1] <- 'Description'
out_part <- filter(out_part, str_detect(out_part$Description, (format(as.Date(ep_start),"%Y"))))

out_part <- out_part%>%mutate(
  Description = case_when(
    out_part$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_part <- out_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#vystúpili z evidencie -> zamestnali sa 
out_elig <- elig %>% select(exit) %>% group_by(format(as.Date(exit),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(out_elig)[1] <- 'Description'
out_elig <- filter(out_elig, str_detect(out_elig$Description, (format(as.Date(ep_start),"%Y"))))

out_elig <- out_elig%>%mutate(
  Description = case_when(
    out_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_elig <- out_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

outflow <- merge(out_part, out_elig, by='Description', all = FALSE)
outflow <- select(outflow, Description, Participants, Eligible)


####
##  Children in the household
####

child_part <- part  %>% select(kids) %>% group_by(kids) %>% summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = kids)  

child_elig <- elig  %>% select(kids) %>% group_by(kids) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = kids)

child <- merge(child_part, child_elig, by='Description', all = TRUE)
child <- select(child, Description, Participants, Eligible)
child <- child[child$Description == 1,] 
child$Description[child$Description == 1] <- 'Children in the household'


####
##  Fields of study
####

study_part <- part  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2)) 

study_elig <- elig  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2)) 

study <- merge(study_part, study_elig, by='odbor', all = TRUE)
study <- select(study, odbor, Participants, Eligible)
study$Participants <- ifelse(is.na(study$Participants), 0, study$Participants)

a <- as.character(c(seq(11,19,1)))
b <- as.character(c(seq(21,39,1)))
c <- as.character(c(seq(41,49,1)))
d <- as.character(c(seq(51,59,1)))
e <- as.character(c(seq(61,79,1)))
f <- as.character(c(seq(81,89,1)))
g <- as.character(c(seq(91,98,1)))

study <- study%>%mutate(
  odbor = case_when(
    study$odbor %in% a ~ 'Natural Science', 
    study$odbor %in% b ~ 'Technical sciences', 
    study$odbor %in% c ~ 'Agricultural, forestry and veterinary sciences', 
    study$odbor %in% d ~ 'Medical and pharmaceutical sciences', 
    study$odbor %in% e ~ 'Social sciences and services', 
    study$odbor %in% f ~ 'Sciences of culture and art', 
    study$odbor %in% g ~ 'Military and security sciences',
    study$odbor == 99 || study$odbor == 0 || study$odbor == 10 ~ 'General sciences and services ',
    TRUE~as.character(study$odbor)
  ) 
)

study <- study %>% select(odbor, Participants, Eligible) %>%
  group_by(odbor)  %>% 
  summarise(Participants = sum(Participants), Eligible = sum(Eligible))  %>%
  mutate(Participants = paste0(Participants, "%")) %>%
  mutate(Eligible = paste0(Eligible, "%")) 
study <- rename(study, Description = odbor)


####
##  SUMMARIZE ####
####

colnames(tab1)<-c("Description", "Participants", "Eligible")
colnames(mean_age)<-c("Description", "Participants", "Eligible")
colnames(male)<-c("Description", "Participants", "Eligible")
colnames(prev_emp)<-c("Description", "Participants", "Eligible")
colnames(un_spell)<-c("Description", "Participants", "Eligible")
colnames(child)<-c("Description", "Participants", "Eligible")

tables <- c('tab1', 'mean_age', 'male', 'prev_emp', 'un_spell', 'child')
basics <- data.frame()

for (name in tables){
  table <- get(name)
  table <- mutate(table, across(everything(), as.factor))
  basics <- bind_rows(basics, table)
}

tables <- c('basics', 'education', 'study', 'skills', 'okres', 'nat', 'inflow', 'outflow')

for (name in tables){
  table <- get(name)
  table <- add_column(table, Variable = name, .after = "Eligible")
  colnames(table) <- c("Description", "Participants", "Eligible", "Variable")
  assign(name, table)
}


sum_table <- rbind(basics, education, study, skills, okres, nat, inflow, outflow)
sum_table <- sum_table %>% relocate(Variable, .before = Description) %>%mutate(
  Variable = case_when(
    sum_table$Variable == 'basics' ~ 'Basics',
    sum_table$Variable == 'education' ~ 'Education level',
    sum_table$Variable == 'study' ~ 'Field of study',
    sum_table$Variable == 'skills' ~ 'Skills',
    sum_table$Variable == 'okres' ~ 'Region',
    sum_table$Variable == 'nat' ~ 'Nationality',
    sum_table$Variable == 'inflow' ~ 'Inflow',
    sum_table$Variable == 'outflow' ~ 'Outflow',
    TRUE ~ as.character(sum_table$Variable)
  )
) 

sum_table$Participants <- ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                 ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                        ifelse(str_detect(as.character(sum_table$Eligible), regex("%")), '0%',0), 
                                        as.character(sum_table$Participants)),
                                 as.character(sum_table$Participants))

sum_table[,2:4]  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  pack_rows(index = table(fct_inorder(sum_table$Variable)))
Description Participants Eligible
Basics
Number of observations 705 269 920
Age (years) 25.2 36.5
Male 46.95% 51.98%
Previous employment 5.39% 8.75%
Length of the unemployment spell 339.42 245.24
Children in the household 13.62% 11.42%
Education level
No education 4.4% 0.69%
Primary 17.3% 13.87%
Lower secondary 12.34% 28.32%
Upper secondary 38.87% 35.72%
Tertiary 27.09% 21.4%
Field of study
Agricultural, forestry and veterinary sciences 2.98% 4.24%
General sciences and services 54.04% 36.61%
Medical and pharmaceutical sciences 0.99% 1.49%
Military and security sciences 0.43% 0.34%
Natural Science 0.84% 0.7%
Sciences of culture and art 1.7% 1.06%
Social sciences and services 25.37% 25.17%
Technical sciences 13.61% 30.38%
Skills
Foreign language 67.94% 62.43%
PC skill 66.52% 56.1%
Driving license 48.09% 54.48%
Region
Banskobystrický 9.36% 12%
Bratislavský 0.43% 11.63%
Košický 19.57% 14.27%
Nitriansky 18.58% 12.86%
Prešovský 29.79% 16.68%
Trenčiansky 6.81% 10.35%
Trnavský 6.38% 9.69%
Žilinský 9.08% 12.52%
Nationality
Slovak 87.94% 90.56%
Hungarian 11.35% 8.2%
Czech 0% 0.44%
Roma 0.57% 0.16%
Other 0.14% 0.65%
Inflow
3Q.2017 6.24% 25.97%
4Q.2017 93.76% 21.34%
Outflow
3Q.2017 2.57% 23.66%
4Q.2017 97.43% 20.06%

Graph 3: Timing and length of participation in the evaluated program

require(gridExtra)
if (nrow(spell_bup) > 5){
  mx <- max(spell_bup$month)
  mn <- min(spell_bup$month)
  plot1 <- ggplot(data=spell_bup, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Inflow into the program in months\nsince the start of the unemployment") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}

if (nrow(spell_aotp) > 5){
  
  mx <- max(spell_aotp$month)
  mn <- min(spell_aotp$month)
  plot2 <- ggplot(data=spell_aotp, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Length of participation\n(in months)") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}
grid.arrange(plot1, plot2, ncol=2)


3. The impact of participation in the [Component] Projects and programmes - Projects - Education and training during 2017

The ultimate objective of ALMP programmes is improving employment chances of its beneficiaries. Constrained by the data provided, we construct four outcome indicators. All four of them are based on the presence in (absence from) the register of JSs administrated by COLSAF. They are, therefore, proxies of the employment status.

#Estimation parameters
#LL: there will be four different samples according to how long have JSs been unemployed prior to receiving the training
Ssamples <- seq(1,4)

# Month of participation since the start of the evaluation period
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) 

#LL: participation quarter, if JS was treated on 3 Feb, pcpQ == 1
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

#LL: create an object that will store data
Mdata<-c()

#LL: these are the periods that we will look at
# the negative values correspond to "placebo" effects. We should not see any effect there, or only a small one.
OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")
#LL: O_vars stores outcome variables
# empl.2 means employment 2 quarters before the start of the evaluation period.
# empl0 will correspond to the last quarter of the year before the start of the eval. period
# empl1 will correspond to the first quarter of the start year of the eval. period


#LL: This is a list of baseline covariates that will be used throughout the analysis
# they correspond to a reasonable minimum of information that should be controlled for
# in order to have a meaningful comparison
list_vars <- c('ent', 'male', 'married','kids',
             'slovak', 'noedu','primary', 'lsec', 'usec',
             'flang', 'drive', 'pc',
             'unpast', 'min_urad', 'min_BA',
             'UR_region', 'roma_share', 'population', 'age')

#LL: In order to have a credible comparison groups. We need to look how similar the groups are, how balanced they are.
# we compare the mean differences BEFORE adjustment and AFTER adjustment
Balance_vars <- list_vars

#LL: We allocate objects that will store the results

#LL: number of treated units
N<-nrow(esample[esample$treated==TRUE,])

#LL: number of treated units in a particular esample
N_sp <- matrix(NA, nrow=length(Ssamples))  

#LL: results array ATT. We are interested in the average treatment effect on the TREATED subpopulation.
# in this particular case, most LM programs are intended for a specific subpopulation.
# it is therefore of less interest to look at the whole population of JSs (ATE)
resultsArray_ATT  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_ATT)[[1]] <- c(O_vars)

#LL: we also store standard errors that quantify STATISTICAL uncertainty of our estimates
resultsArray_se  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_se)[[1]] <- c(O_vars)
results <- array(NA, dim=c(length(O_vars),2))

### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
#LL: these will make 4 (roughly equally sized) different groups according to how long JSs have been unemployed
# this is an important determinant of the effect and it is important to control for it
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#LL: Nonparticipants have only criterium of minimal length of unemployment
# (thus one non-participant can be used multiple times)
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

#LL: We loop over 4 different lenghts of the prior unemployments
for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    # we count how many participants are there in a particular group.
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      
      #LL: we pick a particular subsample of variables.
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      
      #LL: maybe simplify it to
      #month(as.Date(esampleS$entrya))
      #?
      # we can use this with current specification of esample ( participants in the evaluated programme during the evaluation period (ep -> one year) + eligible unemployed at the same period)
      # if we change  specification of esample (i.e. What happened with those ones who became unemployed in a specific year (2017) and they can enter into the program after 3 months since they become unemployed (but they can enter into program 24 months since they become unemployed too))
      
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      #LL: instead, we could have just 
      #esampleS$pcpQ<-quarter(as.Date(esampleS$entrya))
      # without the need to define participation_month
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      #LL: notice that this is a relative number, thus it can be negative(!)
      
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      #LL: infQ = 0 means that this JS entered the register in Q4 of 2016 (if ep_start=="2017-01-01")
      
      # Outflow quarter
      esampleS$Uend_month<-((year(as.Date(esampleS$exit))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$exit)) 
      esampleS$outQ<-ceiling(esampleS$Uend_month/3)
      
      ##Difference between entry into unemployment register and started of participation in measure
      esampleS$diff_entry <- ceiling(as.integer(as.Date(esampleS$entrya) - as.Date(esampleS$entry))/30.417)
      #LL: NOTEL why 30.417?
      # table(ceiling(difftime(as.Date(esampleS$entrya), as.Date(esampleS$entry), units = "days")/30.417)) 
      # parameters of difftime units  = c("auto", "secs", "mins", "hours", "days", "weeks")
      # haven't "months" unit 
      
      #Adding unemployment history
      #LL: previous unemployment history is an important predictor of both the treatment and outcomes
      # it is important to control for it.
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      #Imputing the start of the unemployment spell 
      #LL: wait, isn't it 
      # #Imputing the start of the programme ? YES 
      
      #LL: we wish to make a meaningful comparison.
      # But we're facing a problem because for the non-participants, we don't have date of entry to the course
      # well simply because they did not participate(!)
      # what we do is the following.
      # for every suitable non-participant we chose one particular quarter at random from the evaluation year that is "feasible"
      # "feasible" means that the non-participant _could_have_ potentially participated in that quarter 
      # given that we have a large donor pool, this randomness will not impact our estimates much.
      

            
      #LL: In Pmatrix, we will store for every non-participant, the feasible quarters.
      # e.g. if it is [0 1 1 1], it means that with prob. 1/3 we pick quarter 2,3 or 4
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          #LL: we loop through the participation quarters
          for (p in 1:max_pcpQ){  
            
            #LL: NOTE!!!! 
            # we restrict ourselves to only non-participants who are at the (beginning of the) particular pcpQ unemployed
            ind <- (esampleS$outQ[esampleS$treated==FALSE]>=p)
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ[esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            #LL: NOTE: what is this last commented bit?

            Pmatrix[ind,p]<-as.numeric((esampleS$infQ[esampleS$treated==FALSE])[ind]  %in% PinfQ)
            Pmatrix[!ind,p] <- 0
            
            #LL: does, for this particular quarter of inflow, exists anyone from the list of participants?
            # in other words
            # can we, in a given sample for a given quarter match it to at least one participant?
            
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        sumIsZero    <- as.logical(apply(Pmatrix, 1, FUN=sum)==0)
        sumIsNonZero <- as.logical(apply(Pmatrix, 1, FUN=sum)!=0)

        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsZero] <- 0
        #LL: for every non-participant, for whose inflow we cannot match to ANY participant
        # for any of the four quarters, we assign zero
        
        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsNonZero] <- apply(Pmatrix[sumIsNonZero,],1,hh)
        #LL: for every non-participant we pick one of the feasible quarters RANDOMLY for given esampleS (we have four of these)
        # (remember: feasible means that there exists at least one participant for inflow)
        #LL: NOTE: did we fix the seed?? if not, we should.
  
        
                #LL: loop across different time periods
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                  #LL: we add variables whether someone is in the register.
                  # we consider at most 11 unemployment spells.
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
           #LL: NOTE: can't this be in the same loop (?)
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90)) 
            #LL: NOTE: here persons with multiple unempl spell WITHIN the studied year could create problems
            
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
        for (col in colnames(result$D)){
          D <-  D[!is.na(D[,col]),]
        }    
            
        # ESTIMATION:
        m.1 <- matchit(as.formula(paste(result$spec[2], '~', result$spec[3] , sep = ' ')), 
                       data = D,
                       method = "nearest", 
                       exact = c("infQ"),
                       distance = "glm", link = "probit")
        #plot(summary(m.1))
        m.data1 <- match.data(m.1)
        match.matrix <- data.frame("untreated" = m.1$match.matrix[,1] ,
                                   "treated" = rownames(m.1$match.matrix))
        m.data1[as.character(match.matrix[, 'untreated']), 'entrya'] <- m.data1[as.character(match.matrix[, 'treated']),'entrya']
        m.data1$diff_entry <- ceiling(as.integer(as.Date(m.data1$entrya) - as.Date(m.data1$entry))/30.417)
        
        assign(paste("Mdata",s, sep=""), m.data1)
        Mdata <- bind_rows(Mdata, m.data1)
        

        assign(paste0('balancegraph',s, by=''),summary(m.1, subclass = TRUE))
        
        
        #distance in m.data1 is Propensity score
        #trim = 0.005
        #m.data1 <- m.data1[!(m.data1$distance <= trim | m.data1$distance >= (1-trim)),]
  
        
            for (iQ in O_vars){
              fit <- lm(as.formula(paste(iQ , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                        data = m.data1, 
                        weights = weights)
              res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
              att <- res[2,1]
              se <- res[2,2]
              
              resultsArray_ATT[iQ,s] <- att
              resultsArray_se[iQ,s] <- se
              
        }
    }
  } 


for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum((resultsArray_ATT[iQ,]*(N_sp/N)))
      results[iQ,2] <- sum((resultsArray_se[iQ,]*(N_sp/N)))
      #results[iQ,2] <- sum((resultsArray_se[iQ,]^2*(N_sp/N)))
}

# LL: sanity check
#plot(resultsArray_ATT[1:17,1])
#lines(resultsArray_ATT[1:17,2])
#lines(resultsArray_ATT[1:17,3])
#lines(resultsArray_ATT[1:17,4])

results<-cbind(O_vars, results)   

#Mdata<-rbind(Mdata1, Mdata2, Mdata3, Mdata4)

resultsPSM <- matrix(NA, nrow=length(O_vars), ncol = 2)

for (iQ in 1:length(O_vars)){
            fit <- lm(as.formula(paste(O_vars[iQ] , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                      data = Mdata, 
                      weights = weights)
            res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
            #LL: source of clustering?
            att <- res[2,1]
            se <- res[2,2]
            
            resultsPSM[iQ,1] <- att
            resultsPSM[iQ,2] <- se
            
        }

resultsPSM<-cbind(O_vars, resultsPSM)

#LL: why do results and results PSM give different values?
# I see, it is the standard errors. They are effectively cut in half
# that is in line with square-root convergence, because the sample size is quadrupled.

Graph 4: Balance of the groups of participants and eligible before and after weighting

match.vars <- c('distance',list_vars,paste0("empl.", seq(1,4,1), sep=""),'infQ')


balance_g  <- array(NA, dim=c(length(match.vars),2,length(Ssamples))) 
dimnames(balance_g)[[1]] <-match.vars
dimnames(balance_g)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data')

for(i in 1:length(Ssamples)){
  
  if(exists(paste0('balancegraph', i, by = ''))){
    x <- get(paste0('balancegraph', i, by = ''))
    balance_g[,1,i] <- abs(x$sum.all[,3][match(rownames(balance_g), names(x$sum.all[,3]))])
    balance_g[,2,i] <- abs(x$sum.matched[,3][match(rownames(balance_g), names(x$sum.matched[,3]))])
  }                  
}

balance_f <- array(NA, dim=c(length(match.vars),2)) 
dimnames(balance_f)[[1]] <-match.vars
dimnames(balance_f)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data') 

for (iQ in 1:length(match.vars)){
      balance_f[iQ,1] <- sum(sum((as.numeric(balance_g[iQ,1,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      balance_f[iQ,2] <-sum(sum((as.numeric(balance_g[iQ,2,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

balance_fg <- data.frame(balance = c(balance_f[,1], balance_f[,2]),
                 Balance = c(rep("before matching",length(match.vars)),rep("after matching",length(match.vars))),
                 names = c(match.vars,match.vars) ) 

p <- balance_fg %>% 
    mutate(names = fct_reorder(names, balance)) %>%
    ggplot(aes(x=balance, y=names,col=Balance)) + 
    geom_vline(xintercept = 0.05, linetype="dotted", color = 'darkgrey') + 
    geom_vline(xintercept = 0.0, color = 'darkgrey')+
    geom_vline(xintercept = 0.1, color = 'darkgrey')+
    geom_point()+
    theme_minimal()+
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")
  
p

After inspecting the achieved balance of the “control group” and participants, we can proceed with reporting the estimated effects of participation in the [Component] Projects and programmes - Projects - Education and training during 2017.

3.1 Labour market outcomes of participants and eligible

First, we construct a proxy of the employment rate. It is a share of persons out of the register of unemployed JSs observed on the first day of quarters before and after participation (Quarter 0 was the quarter when participation started).

Graph 5: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(Mdata[Mdata$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(Mdata[Mdata$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs( 
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Second, we look at the number of quarters until the first exit from the register of unemployed JSs. This indicates how participation in the ALMP programme contributed to the shortening of the participants’ unemployment spell. The graph displays the whole distribution of participants and the control group based on the values of this outcome indicator.

Graph 6: Quarters until the first exit

#### THE NUMBER OF MONTHS UNTIL THE FIRST EXIT
########## Months until the first exit
# how many months after the entrya JS got a job

firstempl_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

firstempl_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

firstempl <- rbind(firstempl_P, firstempl_E)

ggplot(firstempl, aes(fill=Group, y=percent, x=quarter)) + 
          geom_bar(position="dodge", stat="identity") +
          facet_wrap(~Group) +
          theme_minimal() + 
          scale_fill_manual(values=c('grey', 'steelblue3')) +
          scale_x_continuous(breaks=seq(0,16,1)) +
          theme(legend.position="none") +
          xlab("Quarters after the start of the participation (0)") + 
          ylab("") +
          labs(caption="Source: COLSAF")+
          scale_y_continuous(labels = percent)

The following graph displays the cumulative number of months spent out of the register during the observation period following the evaluation program’s end.

Graph 7: Number of quarters out of the unemployment register after the end of the participation

## Plotting the number of months in cumulative employment
###### Participants

empl36m_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

empl36m_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

empl36m <- rbind(empl36m_P, empl36m_E)

ggplot(empl36m, aes(fill=Group, y=percent, x=quarter)) + 
  geom_bar(position="dodge", stat="identity") +
  facet_wrap(~Group) +
  theme_minimal() + 
  scale_fill_manual(values=c('grey', 'steelblue3')) +
  scale_x_continuous(breaks=seq(0,16,1)) +
  theme(legend.position="none") +
  xlab("Quarters after the start of the participation (0)") + 
  ylab("") +
  labs(caption="Source: COLSAF")+
  scale_y_continuous(labels = percent)

3.2 Estimation of the average treatment effects on the treated (ATTs)

Now, we explore the average treatment effects on the treated (ATTs).

Graph 8: The average treatment effects on the treated (ATTs) estimated for the participation in the [Component] Projects and programmes - Projects - Education and training during 2017

graphATT

3.3 Statistical significance and heterogeneity of ATT effects

Further, we report the ATTs on the quarterly proxy for employment status, complemented by the indicators of:

  • Number of months until the first exit from the unemployment register JSs (firtempl)
  • Cumulative number of months outside the unemployment register of unemployed JSs (cumempl)

Table 3: The average treatment effect on the treated (ATT)

resultsDF <- select(resultsDF, -pval)
colnames(resultsDF) <- c("", "effect", "se", "sig.")

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))
effect se sig.
empl.4 -0.028 0.015 .
empl.3 0.007 0.013
empl.2 -0.010 0.014
empl.1 -0.017 0.011
empl0 -0.208 0.020 ***
empl1 -0.017 0.029
empl2 0.050 0.022
empl3 0.101 0.020 ***
empl4 0.020 0.019
empl5 -0.007 0.017
empl6 0.028 0.016 .
empl7 0.033 0.017 .
empl8 0.007 0.016
empl9 -0.017 0.017
empl10 -0.049 0.019 **
empl11 -0.022 0.020
empl12 -0.031 0.021
firstempl -0.259 0.131
cumempl -0.113 0.149
1 Significance (sig.): 0 " *** " 0.001 " ** " 0.01 " * " 0.05 " . " 0.1 " " 1

Additionally, we explore the heterogeneity in the effects programme participation has on various subgroups of participants.

We explore the ATTs for subgroups based on:

  • Gender - differences in effects for women and men
  • Education level - differences in effects for JSs with no education, primary education, lower and secondary education.
  • Share of Roma in the city of residence - differences in effects for JSs who lives in the city with the share of Roma below 10% and above 10%
  • The size of the city of residence - differences in effects for JSs who lives in the city with a population less than 4 000 and more than 4 000

Table 4: The average treatment effect on the treated: subgroup based difference between entry into unemployment register and started of participation in measure

diff_entry
Timing of participation
Sample
0-6 months
7-12 months
13+ months
effect se sig. effect se sig. effect se sig. effect se sig.
empl.4 -0.028 0.015 . -0.014 0.016 0.011 0.047 -0.204 0.049 ***
empl.3 0.007 0.013 -0.003 0.014 -0.009 0.058 -0.030 0.024
empl.2 -0.010 0.014 0.015 0.018 -0.108 0.034 *** 0.000 0.000
empl.1 -0.017 0.011 0.029 0.016 . 0.000 0.000 0.000 0.000
empl0 -0.208 0.020 *** -0.236 0.024 *** -0.126 0.053
-0.194 0.057 ***
empl1 -0.017 0.029 -0.004 0.031 -0.001 0.072 -0.128 0.081
empl2 0.050 0.022
0.072 0.026 ** -0.018 0.068 -0.070 0.081
empl3 0.101 0.020 *** 0.121 0.023 *** 0.046 0.067 -0.003 0.080
empl4 0.020 0.019 0.040 0.021 . -0.075 0.063 -0.003 0.077
empl5 -0.007 0.017 -0.006 0.018 0.008 0.061 0.038 0.080
empl6 0.028 0.016 . 0.029 0.017 . 0.035 0.058 0.074 0.077
empl7 0.033 0.017 . 0.036 0.018
0.035 0.067 0.005 0.075
empl8 0.007 0.016 0.008 0.018 0.047 0.058 -0.006 0.074
empl9 -0.017 0.017 -0.030 0.019 0.086 0.055 -0.019 0.075
empl10 -0.049 0.019 ** -0.036 0.021 . -0.054 0.057 -0.108 0.077
empl11 -0.022 0.020 -0.009 0.022 -0.043 0.062 -0.073 0.076
empl12 -0.031 0.021 -0.034 0.022 -0.016 0.063 -0.053 0.080
firstempl -0.259 0.131
-0.398 0.123 *** 0.074 0.462 0.478 0.584
cumempl -0.113 0.149 -0.049 0.148 -0.075 0.530 -0.540 0.666
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 1 406 observations 0-6 months: 1 038 observations 7-12 months: 194 observations 13+ months: 173 observations

Table 5: The average treatment effect on the treated: subgroup based on gender

gender
Gender
Sample
Women
Men
effect se sig. effect se sig. effect se sig.
empl.4 -0.028 0.015 . -0.038 0.021 . -0.012 0.023
empl.3 0.007 0.013 0.010 0.018 0.000 0.023
empl.2 -0.010 0.014 -0.016 0.023 0.000 0.025
empl.1 -0.017 0.011 0.017 0.022 -0.055 0.026
empl0 -0.208 0.020 *** -0.177 0.029 *** -0.245 0.026 ***
empl1 -0.017 0.029 -0.011 0.035 -0.021 0.042
empl2 0.050 0.022
0.015 0.031 0.084 0.032 **
empl3 0.101 0.020 *** 0.052 0.029 . 0.157 0.029 ***
empl4 0.020 0.019 -0.012 0.029 0.059 0.030
empl5 -0.007 0.017 -0.017 0.027 0.009 0.025
empl6 0.028 0.016 . 0.003 0.023 0.055 0.024
empl7 0.033 0.017 . 0.007 0.024 0.060 0.024
empl8 0.007 0.016 -0.001 0.023 0.015 0.026
empl9 -0.017 0.017 -0.003 0.025 -0.031 0.027
empl10 -0.049 0.019 ** -0.028 0.027 -0.080 0.029 **
empl11 -0.022 0.020 -0.023 0.027 -0.028 0.026
empl12 -0.031 0.021 -0.028 0.027 -0.034 0.030
firstempl -0.259 0.131
-0.206 0.197 -0.325 0.152
cumempl -0.113 0.149 -0.223 0.215 0.000 0.208
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 1 406 observations Women: 736 observations Men: 670 observations

Table 6: The average treatment effect on the treated: subgroup based on education level

education
Education
Sample
No education
Primary
Lower secondary
Upper secondary
effect se sig. effect se sig. effect se sig. effect se sig. effect se sig.
empl.4 -0.028 0.015 . -0.225 0.150 -0.078 0.049 0.038 0.041 -0.023 0.024
empl.3 0.007 0.013 0.222 0.089
0.015 0.047 -0.022 0.045 0.004 0.024
empl.2 -0.010 0.014 -0.125 0.101 0.012 0.039 0.045 0.046 -0.005 0.028
empl.1 -0.017 0.011 0.182 0.126 -0.018 0.046 -0.044 0.062 0.004 0.027
empl0 -0.208 0.020 *** -0.209 0.127 . -0.238 0.045 *** -0.287 0.067 *** -0.218 0.035 ***
empl1 -0.017 0.029 -0.152 0.142 -0.092 0.070 0.085 0.082 0.014 0.040
empl2 0.050 0.022
-0.104 0.156 -0.068 0.067 0.039 0.075 0.128 0.034 ***
empl3 0.101 0.020 *** 0.106 0.157 0.019 0.071 0.189 0.068 ** 0.113 0.033 ***
empl4 0.020 0.019 0.175 0.159 -0.031 0.061 0.125 0.069 . -0.040 0.028
empl5 -0.007 0.017 0.089 0.132 -0.012 0.062 0.034 0.055 -0.049 0.023
empl6 0.028 0.016 . 0.064 0.152 -0.028 0.057 0.116 0.053
0.018 0.023
empl7 0.033 0.017 . -0.088 0.156 -0.001 0.052 0.099 0.060 . 0.039 0.024
empl8 0.007 0.016 0.042 0.161 0.022 0.058 0.020 0.058 -0.013 0.021
empl9 -0.017 0.017 0.085 0.192 0.040 0.057 -0.020 0.056 -0.042 0.025 .
empl10 -0.049 0.019 ** 0.113 0.191 -0.109 0.058 . -0.044 0.057 -0.069 0.026 **
empl11 -0.022 0.020 0.227 0.191 -0.101 0.057 . -0.027 0.063 -0.009 0.028
empl12 -0.031 0.021 0.109 0.191 -0.073 0.061 -0.096 0.065 -0.035 0.029
firstempl -0.259 0.131
0.104 1.317 0.574 0.433 -1.122 0.445
-0.405 0.142 **
cumempl -0.113 0.149 0.457 1.513 -0.674 0.479 0.233 0.472 -0.163 0.182
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 1 406 observations No education: 63 observations Primary: 245 observations Lower secondary: 159 observations Upper secondary: 543 observations

Table 7: The average treatment effect on the treated: subgroup based on share of Roma in the city of residence

romas
Roma share
Sample
0-10%
10-100%
effect se sig. effect se sig. effect se sig.
empl.4 -0.028 0.015 . -0.026 0.017 -0.037 0.034
empl.3 0.007 0.013 0.008 0.017 0.021 0.028
empl.2 -0.010 0.014 0.014 0.021 -0.061 0.028
empl.1 -0.017 0.011 -0.034 0.017
0.018 0.026
empl0 -0.208 0.020 *** -0.224 0.026 *** -0.166 0.034 ***
empl1 -0.017 0.029 0.013 0.032 -0.071 0.048
empl2 0.050 0.022
0.100 0.026 *** -0.047 0.041
empl3 0.101 0.020 *** 0.122 0.024 *** 0.079 0.041 .
empl4 0.020 0.019 0.014 0.022 0.036 0.041
empl5 -0.007 0.017 -0.012 0.018 0.018 0.037
empl6 0.028 0.016 . 0.034 0.017
0.031 0.036
empl7 0.033 0.017 . 0.046 0.019
0.020 0.037
empl8 0.007 0.016 0.015 0.017 -0.003 0.034
empl9 -0.017 0.017 -0.029 0.017 . 0.005 0.036
empl10 -0.049 0.019 ** -0.029 0.022 -0.094 0.035 **
empl11 -0.022 0.020 -0.010 0.022 -0.049 0.037
empl12 -0.031 0.021 -0.040 0.023 . -0.026 0.038
firstempl -0.259 0.131
-0.501 0.110 *** 0.142 0.317
cumempl -0.113 0.149 0.000 0.145 -0.266 0.312
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 1 406 observations 0-10%: 894 observations 10-100%: 512 observations

Table 8: The average treatment effect on the treated: subgroup based on the size of the city of residence

City
Type of city
Sample
Village
City
effect se sig. effect se sig. effect se sig.
empl.4 -0.028 0.015 . -0.033 0.023 -0.030 0.022
empl.3 0.007 0.013 0.038 0.019
-0.017 0.022
empl.2 -0.010 0.014 -0.048 0.021
0.019 0.025
empl.1 -0.017 0.011 -0.040 0.021 . 0.012 0.022
empl0 -0.208 0.020 *** -0.235 0.030 *** -0.178 0.028 ***
empl1 -0.017 0.029 -0.024 0.039 -0.008 0.040
empl2 0.050 0.022
0.047 0.034 0.047 0.033
empl3 0.101 0.020 *** 0.107 0.031 *** 0.094 0.026 ***
empl4 0.020 0.019 0.010 0.030 0.026 0.028
empl5 -0.007 0.017 -0.022 0.027 0.013 0.024
empl6 0.028 0.016 . 0.033 0.025 0.027 0.022
empl7 0.033 0.017 . 0.063 0.025
0.006 0.024
empl8 0.007 0.016 0.016 0.026 -0.001 0.022
empl9 -0.017 0.017 0.002 0.025 -0.031 0.024
empl10 -0.049 0.019 ** -0.054 0.028 . -0.051 0.028 .
empl11 -0.022 0.020 -0.018 0.029 -0.036 0.027
empl12 -0.031 0.021 -0.021 0.030 -0.050 0.029 .
firstempl -0.259 0.131
-0.245 0.190 -0.220 0.169
cumempl -0.113 0.149 -0.096 0.214 -0.141 0.206
1 Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
2 Sample: 1 406 observations Village: 717 observations City: 689 observations

4. Technical appendix

The technical appendix provides additional details on the data processing procedures, complemented by results obtained by an alternative estimation strategy based on inverse probability weighting.

4.1 Sample Selection

This section displays additional detail about the selection and cleaning of the evaluation sample.

info_table <- data.frame(matrix(ncol=2,nrow=0, dimnames=list(NULL, c("Desciption", "Value"))))
info_table[1,] <- c('Start of the evaluation period', paste(ep_start))
info_table[2,] <- c('End of the evaluation period', paste(ep_end))

info_table <-
  info_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)


sample_selection <- data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("Desciption", "Dropped","Total"))))

sample_selection[1,] <- c('Total registrations ', 0,format(n1, big.mark=" ", scientific=FALSE))
sample_selection[2,] <- c('Total eligible jobseekers', 0,format(n2, big.mark=" ", scientific=FALSE))
sample_selection[3,] <- c('Total participations in the evaluated measure', 0,format(npart1, big.mark=" ", scientific=FALSE))
sample_selection[4,] <- c('Total participants in the evaluated measure', 0 ,format(npart0, big.mark=" ", scientific=FALSE))

x<-n1-n4
y<-npart1-n3
sample_selection[5,] <- c('Dropped eligible JSs', format(n4, big.mark=" ", scientific=FALSE), format(x, big.mark=" ", scientific=FALSE))
sample_selection[6,] <- c('Dropped participants', format(n3, big.mark=" ", scientific=FALSE), format(y, big.mark=" ", scientific=FALSE))

x<-x-n6
y<-y-n5
sample_selection[7,] <- c('Dropped eligible JSs', format(n6, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[8,] <- c('Dropped participants', format(n5, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n7
sample_selection[9,] <- c('Dropped participants', format(n7, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

x<-x-n9
y<-y-n8
sample_selection[10,] <- c('Dropped eligible JSs', format(n9, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[11,] <- c('Dropped participants', format(n8, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-nrowdfa-n10
sample_selection[12,] <- c('Inflating participants by merging dataframes', 0,format(nrowdfa, big.mark=" ", scientific=FALSE))
sample_selection[13,] <- c('Dropped participants', format(n10, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n11
sample_selection[14,] <- c('Dropped participants', format(n11, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

sample_selection[15,] <- c('Eligibles', format(x-(length(unique(esample$klient_id[esample$treated==FALSE]))), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[16,] <- c('Participants', format(y-length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection[17,] <- c('Eligibles', 0, format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[18,] <- c('Participants', 0, format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection$Variable <- c('All registrations (before cleaning)', 'All registrations (before cleaning)', 'All registrations (before cleaning)','All registrations (before cleaning)',
                             
                             'Dropped JSs with ALMP participation 2 years before the EP', 'Dropped JSs with ALMP participation 2 years before the EP',
                             
                             'Dropped JSs with ALMP participation in other ALMP during the EP', 'Dropped JSs with ALMP participation in other ALMP during the EP',
                             
                             'Dropped participants with multiple ALMP participations in evaluated programme',
                             
                             'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',  'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',
                             
                             'Dropped participations which not happening during an unemployment spell','Dropped participations which not happening during an unemployment spell', 
                             
                             'Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure ',
                             
                             'Dropped JSs with multiple registrations', 'Dropped JSs with multiple registrations',
                             
                             'Total registrations after cleaning', 'Total registrations after cleaning'
)

sample_selection <-sample_selection[,-4] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)%>%
  pack_rows(index = table(fct_inorder(sample_selection$Variable)))

Table 9: Information table about the beginning and end of the evaluation period

info_table
Desciption Value
Start of the evaluation period 2017-01-01
End of the evaluation period 2017-12-31

Table 10: Data cleaning documentation

sample_selection
Desciption Dropped Total
All registrations (before cleaning)
Total registrations 0 443 969
Total eligible jobseekers 0 395 328
Total participations in the evaluated measure 0 2 144
Total participants in the evaluated measure 0 2 137
Dropped JSs with ALMP participation 2 years before the EP
Dropped eligible JSs 56 701 387 268
Dropped participants 397 1 747
Dropped JSs with ALMP participation in other ALMP during the EP
Dropped eligible JSs 52 431 334 837
Dropped participants 396 1 351
Dropped participants with multiple ALMP participations in evaluated programme
Dropped participants 10 1 341
Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP
Dropped eligible JSs 31 705 303 132
Dropped participants 582 759
Dropped participations which not happening during an unemployment spell
Inflating participants by merging dataframes 0 886
Dropped participants 173 713
Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure
Dropped participants 8 705
Dropped JSs with multiple registrations
Eligibles 33 212 269 920
Participants 0 705
Total registrations after cleaning
Eligibles 0 269 920
Participants 0 705

4.2 Description of explanatory variables (used in balancing)

Description of variables, used in the estimation of ATTs. We also show their balance before and after matching in graph 4 and balance before and after weighting in graph 10.

Table 11: List and description of variables used in estimation

#TREBA DOPLNIŤ DO POSTUP_EXPORT
Q[["df"]][["flang"]] <- "knowledge of a foreign language (1: Yes, 0: No)"
Q[["df"]][["drive"]] <- "driving license holder (1: Yes, 0: No)"
Q[["df"]][["pc"]] <- "computer skills (1: Yes, 0: No)"
Q[["df"]][["unpast"]] <- "registered citizen in the past (1: Yes, 0: No), unemployed in past"
Q[["df"]][["min_urad"]] <- "traveling time to the nearest Labour office (in minutes)"
Q[["df"]][["min_BA"]] <- "traveling time to the Capital city - Bratislava (in minutes)"
Q[["df"]][["population"]] <- "number of inhabitants of the place of residence"



list_vars_table <- data.frame(Variables = list_vars)

desc <- c()
for (i in list_vars_table$Variables){
  Q[["df"]][[i]][1]
  desc <- append(desc, ifelse(is.null(Q[["df"]][[i]][1]), NA, Q[["df"]][[i]][1]))
}

list_vars_table$Description <- desc
list_vars_table[list_vars_table$Variables == "ent","Description"] <- "difference between entry into unemployment register and started of evalueted period"
list_vars_table[list_vars_table$Variables == "UR_region","Description"] <- "unemployment rate in region during valuated period"
list_vars_table[list_vars_table$Variables == "roma_share","Description"] <- "share of Roma in the place of residence"

list_vars_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) 
Variables Description
ent difference between entry into unemployment register and started of evalueted period
male combination of klient_id and the date of entry is an Unique ID. (we created and used dup column to identify duplicates and remove them)
married marital status: single
kids nationality: other
slovak marital status: other
noedu kids under 10 years
primary level of education: no education
lsec level of education: primary
usec level of education: ower secondary
flang knowledge of a foreign language (1: Yes, 0: No)
drive driving license holder (1: Yes, 0: No)
pc computer skills (1: Yes, 0: No)
unpast registered citizen in the past (1: Yes, 0: No), unemployed in past
min_urad traveling time to the nearest Labour office (in minutes)
min_BA traveling time to the Capital city - Bratislava (in minutes)
UR_region unemployment rate in region during valuated period
roma_share share of Roma in the place of residence
population number of inhabitants of the place of residence
age gender (1: man, 0: woman)

4.3 Alternative estimation strategy (Inverse probability weighting)

In this section, we explore the sensitivity of our results to an alternative model specifications. The main results, reported above, were obtained by [propensity score matching] (https://cran.r-project.org/web/packages/MatchIt/MatchIt.pdf). An alternative estimation strategy was applied using an [inverse probability weighting estimator] (https://www.rdocumentation.org/packages/causalweight/versions/1.0.2/topics/treatweight).

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

list_vars <- c('ent', 'male', 'married','kids',
           'slovak', 'noedu','primary', 'lsec', 'usec',
           'flang', 'drive', 'pc',
           'unpast', 'min_urad', 'min_BA',
           'UR_region', 'roma_share', 'population', 'age')

#     # All potentially useful explanatory variables (Xs)
#     list_vars <- c('ent', 'male', 'single', 'married','kids',
#                    'slovak', 'hungarian', 'roma', 
#                    'noedu','primary', 'lsec', 'usec', 'tertiary'
#                    , 'zaujem_vzdel',
#                    'flang', 'drive', 'pc',
#                    'healthy', 'barrier', 'graduate', 'ziad_undn_sp', 'cvyhl_poisteu', 
#                    'empl', 'unpast', 'employee', 'selfempl', 'zaujem_szco',
#                    'look_ptime', 'commute', 'relocate', 'zaujem_zam_zahr',
#                    'min_kraj', 'min_urad', 'min_BA', 
#                    'UR_region', 'roma_share', 'population', 
#                    ageg_dummies,
# #                    paste0("urad_",seq(from=1, to=46), sep=""), 
#                    paste0("isco1_",seq(from=1, to=3), sep=""),
#                    paste0("odbor1_",seq(from=1, to=5), sep=""))
# #                   paste0(colnames(df)[grepl("nace1_",colnames(df))], sep=""))

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==TRUE,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) 
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),7,length(Ssamples))) 
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  dimnames(resultsArray_ATT)[[2]] <- c('ATT', 'se', 'pval', 'Y1', 'Y0', 'SampleSize', 'Sign.')
  results <- array(NA, dim=c(length(O_vars),4))

  balance_matrix_w <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_w)[[1]] <- c(list_vars)
  
  results_bv_W <- array(NA, dim=c(1,length(list_vars)))
  
  balance_matrix_un <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_un)[[1]] <- c(list_vars)
  
  results_bv_un <- array(NA, dim=c(1,length(list_vars)))
  
### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

DataSample <- c()

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==FALSE]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == FALSE] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == FALSE] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
            for (col in colnames(result$D)){
               D <-  D[!is.na(D[,col]),]
             }  
        # ESTIMATION:
            
            d = D$treated*1
            x = as.matrix(D[,c(list_vars, paste0("empl.", seq(1,4,1), sep=""))])
            y_mat <- D[,c(paste0("empl.", seq(1,4,1), sep=""), 
                                    paste0("empl", seq(0,12,1), sep=""), 
                                    'firstempl', 'cumempl')]
            
            att <- treatweight_pmp(y = y_mat, d, x, s = NULL, z = NULL, selpop = FALSE, trim = 0.05, ATET = TRUE, logit = TRUE, boot = 10)
            #att <- treatselDML(y = y_mat, d, x, s = d, z=x, selected=1)
            
            resultsArray_ATT[,1,s] <- round(att$effect,3)
            resultsArray_ATT[,2,s] <- round(att$se,3)
            resultsArray_ATT[,3,s] <- round(att$pval,3)
            resultsArray_ATT[,4,s] <- round(att$y1,3)
            resultsArray_ATT[,5,s] <- round(att$y0,3)
            resultsArray_ATT[,6,s] <- format(length(d)-att$ntrimmed, big.mark=" ", scientific=FALSE)
            resultsArray_ATT[,7,s] <- stars.pval(att$pval)
            
            
            DataSample <- bind_rows(DataSample, D)
            
            #Balance
            #Generating the propensity score variable
            PSmodel<-glm(result$spec, family=binomial(link = "logit"), data=D)
            #print(summary(PSmodel))
            D$PSvar<-as.numeric(PSmodel$fitted.values)
            
            #LL: old weights INCORRECT
            #w_ATE <- D$treated/D$PSvar + (1-D$treated)/(1-D$PSvar)
            
            #LL: new weights CORRECT
            w_ATE <- D$treated + (1-D$treated)*D$PSvar/(1-D$PSvar)
            
                          #Balance_vars <- colnames(result$D)[colnames(result$D) %in% list_vars]
                          for (bv in Balance_vars){
                          
                              if (apply(D[,bv,drop = FALSE] ,2,function(x) { all(x %in% c(0:1)) }) ) {
                                #unweighted discrete
                                p_treat <- apply(D[D$treated==1,bv,drop = FALSE],2,mean)
                                p_contr <- apply(D[D$treated==0,bv,drop = FALSE],2,mean)
                                balance_matrix_un[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )

                                #weighted discrete
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                
                                balance_matrix_w[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )
                            
                              } else {
                                #unweighted continuous
                                balance_matrix_un[bv,s] <- abs( 100*(apply(D[D$treated==1,bv,drop = FALSE],2,mean) - apply(D[D$treated==0,bv,drop = FALSE],2,mean))/
                                                  sqrt( (apply(D[D$treated==1,bv,drop = FALSE],2,sd)^2 + apply(D[D$treated==0,bv,drop = FALSE],2,sd)^2)/2 ) )

                                #weighted continuous
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                p_treat_var <- ( sum(w_ATE[D$treated==1]) / (sum(w_ATE[D$treated==1])^2 - sum(w_ATE[D$treated==1]^2)) )* 
                                  t(w_ATE[D$treated==1]) %*% ((D[D$treated==1,bv] - c(p_treat))^2)
                                p_contr_var <- ( sum(w_ATE[D$treated==0]) / (sum(w_ATE[D$treated==0])^2 - sum(w_ATE[D$treated==0]^2)) ) * 
                                  t(w_ATE[D$treated==0]) %*% ((D[D$treated==0,bv] - c(p_contr))^2)
                                
                                balance_matrix_w[bv,s] <- abs(100* (p_treat - p_contr ) / 
                                            sqrt( (p_treat_var + p_contr_var )/2 ) )
                              }

                          }
    }
} 
  

for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum(sum((as.numeric(resultsArray_ATT[iQ,'ATT',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,2] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'se',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,3] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y1',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,4] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y0',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

results<-cbind(O_vars, results) 
colnames(results) <- c('O_vars', 'ATT', 'se','Y1', 'Y0')

results_bv <- array(NA, dim=c(length(list_vars),2)) 
dimnames(results_bv)[[1]] <- c(list_vars)
dimnames(results_bv)[[2]] <- c("unweighted", "weighted")


for (bVar in 1:length(list_vars)){
      results_bv[bVar,1] <- abs(sum(sum((balance_matrix_w[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
      results_bv[bVar,2] <- abs(sum(sum((balance_matrix_un[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
}

We can explore the sensitivity of our results by displaying the average treatment effects on the treated (ATTs) computing by different approach by inverse probability weighting.

Graph 10: Balance of the groups of participants and eligible before and after weighting x

BVDF <- data.frame(balance_vars = c(rownames(results_bv),rownames(results_bv)),
                   balance = c(results_bv[,1], results_bv[,2]),
                   Balance = c(rep("before weighting",nrow(results_bv)),rep("after weighting",nrow(results_bv))))
                          
BVDF %>% subset(!is.na(balance)) %>% 
    mutate(balance_vars = fct_reorder(balance_vars, balance)) %>%
    ggplot(aes(x=balance, y=balance_vars,col=Balance)) + 
    geom_point() +
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme_minimal()+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")+
    geom_vline(xintercept = 0.0, color = 'darkgrey')

We show a proxy of the employment rate to see a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarter 0).

Graph 11: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(DataSample[DataSample$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(DataSample[DataSample$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs(
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Graph 12: The average treatment effects on the treated (ATTs) estimated for the participation in the [Component] Projects and programmes - Projects - Education and training during 2017

graphATT

Finaly, we report the ATTs in table computing computing by different approach by inverse probability weighting.

Table 12: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance codes (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))
efekt se sig.
empl.4 0.000 0.001
empl.3 0.000 0.000
empl.2 0.000 0.000
empl.1 0.000 0.001
empl0 -0.279 0.028 ***
empl1 0.008 0.037
empl2 0.076 0.031
empl3 0.086 0.028 **
empl4 0.010 0.022
empl5 -0.009 0.023
empl6 0.012 0.021
empl7 0.013 0.024
empl8 -0.002 0.019
empl9 -0.017 0.025
empl10 -0.043 0.031
empl11 -0.027 0.028
empl12 -0.032 0.026
firstempl -0.164 0.146
cumempl -0.203 0.169
1 Significance codes (sig.): 0 " *** " 0.001 " ** " 0.01 " * " 0.05 " . " 0.1 " " 1

References

Daniel E. Ho, Kosuke Imai, Gary King, Elizabeth A. Stuart (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, Vol. 42, No. 8, pp. 1-28. https://doi.org/10.18637/jss.v042.i08

Frölich, M., Huber, M. (2014): “Treatment Evaluation With Multiple Outcome Periods Under En- dogeneity and Attrition”, Journal of the American Statistical Association, 109, 1697-1711.

https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html

LMP Qualitative data in MS Excel, downloaded from (accessed at 14th of April 2021) : https://ec.europa.eu/social/main.jsp?catId=1143&intPageId=3227&langId=en