T-TEST

 STUDENT'S T-TEST

Theoretical work on t-distribution was done by W.S Gossett in early 1900. The t- t-distribution is used when the sample size is 30 or less and the population standard deviation is unknown. This test is based on the assumption that the universe from which the sample was taken, is normally distributed.
Student's t-test, often referred to simply as the t-test, is a statistical hypothesis test used to determine if there is a significant difference between the means of two independent groups. It is named after its developer, William Sealy Gosset, who published under the pseudonym "Student" due to his employment at the Guinness Brewery. The t-test is widely used in various fields, including science, medicine, economics, and social sciences, to compare the means of two groups and assess whether observed differences are likely due to chance or if they are statistically significant.
Here are the key components and concepts of Student's t-test:
Two Independent Groups: The t-test is designed to compare two independent groups or samples. These groups can represent different conditions, treatments, populations, or any other relevant categories.

Null Hypothesis (H0): The null hypothesis in a t-test states that there is no significant difference between the means of the two groups. It assumes that any observed differences are due to random sampling variability.

Alternative Hypothesis (H1): The alternative hypothesis (also called the research hypothesis) suggests that there is a significant difference between the means of the two groups. It is what the researcher aims to support with evidence.

Test Statistic (t): The t-test calculates a test statistic (t-value) that quantifies the difference between the sample means relative to the variability within each group. The formula for the t-value depends on whether the two groups have equal variances (using the pooled variance) or unequal variances (Welch's t-test).

Degrees of Freedom (df): The degrees of freedom represent the number of values in the final calculation of a statistic that are free to vary. In a t-test, degrees of freedom are determined by the sample sizes and variances of the two groups.

P-Value: The t-value is used to calculate a p-value, which measures the probability of observing the observed difference (or more extreme differences) between the group means under the assumption that the null hypothesis is true. A small p-value (typically less than 0.05) suggests that the observed difference is statistically significant, leading to the rejection of the null hypothesis.

Assumptions: The t-test assumes that the data within each group are normally distributed and that the variances of the two groups are roughly equal (homoscedasticity). Violations of these assumptions can affect the validity of the test.

Types of t-Tests: There are several variations of the t-test, including:

Independent samples t-test: Compares two independent groups.
Paired samples t-test: Compares two related groups (e.g., before and after measurements) to assess the mean difference.
One-sample t-test: Tests whether a single sample mean is significantly different from a known or hypothesized population mean.
Effect Size: In addition to p-values, it's common to report an effect size measure in t-tests, such as Cohen's d, which quantifies the magnitude of the observed difference between means.
Practical Significance: While statistical significance indicates whether a difference is unlikely to have occurred by chance, researchers should also consider practical significance, which assesses the real-world importance of the difference.
Student's t-test is a valuable tool for comparing means and making inferences about population differences. It helps researchers determine whether observed group differences are statistically meaningful or whether they can be attributed to random variability.
t-test


Different uses of t-distribution or t-test:

a) To test the significance of the mean of a random sample.

In determining whether the mean of a sample drawn from a normal population deviates significantly from a state value (the hypothetical value of the population mean) when the variance of the population is unknown, we calculate the statistic:
t = (mean X-mu)root( n)/S
# mean X =sample mean
# mu =Hypothetical Population Mean
# n = Size of Sample
# S= Standard deviation of Sample
  S=root[sum(X-mean X)^2/(n-1)
Degree of freedom,v(Pronounced as nu)= n-1
If the calculated value of t exceeds t(0.05), we say that the difference between X and mu is significant at 5%
 level, if it exceeds t(0.01)the difference is said to be significant 1% level. If t < t(0.05), we conclude that the difference between mean X and mu is not significant, and hence the sample might have been drawn from a population with mean=mu.
Determination of Confidence Limits:
5% confidence level : mean X +- t(0.05)[S/root(n)]
1% confidence level : meanX +- t(0.01) [S/root(n)

b)To test the significance of difference in mean of two independent samples:

t (meanX1-meanX2) =[(mean X1-mean X2)/s].root[(n1n2)/(n1+n2)]
# mean X1= Mean of first Sample
# mean X2=Mean of 2nd  Sample
# n1= no. of items in first sample
#n2=no. of items in 2nd sample
#S=Combined standard deviation of samples
S= root[sum{(X1-meanX1)^2+sum(X2-meanX2)^2}/(n1+n2-2)]
# X1=actual mean of first sample
#X2=actual mean of second sample
If standard deviation of the samples are given:
S= root[{S1^2(n1-1)+S2^2(n2-1)}/(n1+n2-2)]
#S1=Standard deviation of first sample
# S2=Standard deviation of second sample
If the calculated value of t is>t(0.05)[t0.01) the difference between the sample means is said to be significant at the 5%(1%) level, otherwise the null hypothesis is accepted.

 c)To test the difference between the mean of two dependent samples.

Two samples are said to be dependent when the elements in one sample are related to those in the other in any significant or meaningful manner. In fact, the samples may consist of pairs of observations made on the same element. When samples are dependent, they comprise the same number of elementary units. Suppose, the mean marks obtained by10 students in a test were calculated. Then, they were given special coaching, and again a test was conducted. The mean marks obtained by these students in the second test are compared with the mean marks in the first test. Thus, in the t-test the test units are the same, only their performance 'before and after' test is compared. Here:
t = (mean D.root n)/S
#meanD= the mean of difference
#D= X2-X1
#n=size of sample
S=root[sum(D-meanD)^2/(n-1)]
Degree of freedom=n-1

d) Testing the significance of an observed correlation coefficient.

Given a random sample from a bivariate normal population if we are to test the hypothesis that the correlation coefficient of the population is zero i.e, the variables in the population are uncorrelated, we have to apply the following test:
t = r.root(n-2)/root(1-r^2)
Degree of freedom n-2
If the calculated value of t exceeds t(0.05) for (n-2)d.f, we say that the value of r is significant at the 5% level. If t<t(0.05) the data are consistent with the hypothesis of an uncorrelated population.
We can easily demonstrate the one-sample t-test on some fictitious data. Suppose we would like to know whether it is reasonable to assume that the following IQ sample data could have been drawn from a population of IQ sample data could have been drawn from a population of IQ scores mean equal to 100. Our null hypothesis isH0:mu=100 against the alternative hypothesis that H1:muis does not equal 100. We first enter our data in Python.
iq = [1059811010595]
df = pd.DataFrame(iq)
df
from scipy import stats
stats.ttest_1samp(df, 100.0)
Ttest_1sampResult(statistic=array([0.9649505]),
pvalue=array([0.38921348]))
The t- statistics for the test is equal
to 0.9649505,with an associated
p-value of 0.38921348.
Since the p-value is rather large, we
do not reject the null hypothesis.
That is we have insufficient
evidence to suggest that our sample
of IQ values were not drawn from a
population with a mean equal
to 100, only we fail to reject
the null hypothesis.

Paired sample t-test python

trial_1 = [1012.19.211.68.310.5]
trial_2 = [8.211.28.110.57.69.5]
paired_data = pd.DataFrame(trial_1, trial_2)
paired_data
stats.ttest_rel(trial_1, trial_2)  
#t-test
Ttest_relResult(statistic=7.201190377787752,
pvalue=0.0008044382024663002)
The value of t-test is equal to 7.20 with an
associated p value of 0.0008,which is statisticlly
significant.Therefore we have evidence to suggest
that the mean difference between the two trials is
not equal to 0.That is we reject the null hypothesis
and infer the alternative hypothesis that in the
population from which these data were drawn,the
true mean difference is unequal to 0.
import seaborn as sns
plt.figure(figsize=(107))
sns.distplot(trial_1)


plt.figure(figsize=(107))
sns.distplot(trial_2)


 
More Z-TEST
         F-TEST


Post a Comment

0 Comments